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Preface 



This volume contains the papers presented at the 20 tlr International Conference 
on Logic Programming, held in Saint-Malo, France, September 6-10, 2004. Since 
the first meeting in this series, held in Marseilles in 1982, ICLP has been the 
premier international conference for presenting research in logic programming. 
This year, we received 70 technical papers from countries all over the world, and 
the Program Committee accepted 28 of them for presentation; they are included 
in this volume. 

A stand-by-your-poster session took place during the conference. It served as 
a forum for presenting work in a more informal and interactive setting. Abstracts 
of the 16 posters selected by the Program Committee are included in this volume 
as well. 

The conference program also included invited talks and invited tutorials. 
We were privileged to have talks by three outstanding researchers and excellent 
speakers: Nachum Derslrowitz (Tel Aviv University, Israel) talked on Termi- 
nation by Abstraction, Michael Gelfond (Texas Tech University, USA) on An- 
swer Set Programming and the Design of Deliberative Agents, and Gerard Huet 
(INRIA, France) on Non- determinism Lessons. Two of the invited talks appear 
in these proceedings. The tutorials covered topics of high interest to the logic 
programming community: Ilkka Niemela gave a tutorial on The Implementation 
of Answer Set Solvers, Andreas Podelski on Tree Automata in Program Analysis 
and Verification, and Guillermo R. Simari on Defeasible Logic Programming and 
Belief Revision. 

Satellite workshops made the conference even more interesting. Six workshops 
collocated with ICLP 2004: 

- CICLOPS2004, Colloquium on Implementation of Constraint and Logic 
Programming Systems, organized by Manuel Carro. 

- COLOPS 2004, 2nd International Workshop on Constraint & Logic Program- 
ming in Security, organized by Frank Valencia. 

- MultiCPL 2004, 3rd International Workshop on Multiparadigm Constraint, 
organized by Petra Hofstedt. 

- TeachLP 2004, 1st International Workshop on Teaching Logic Programming, 
organized by Dietmar Seipel. 

- WLPE2004, 14tlr Workshop on Logic Programming Environments, orga- 
nized by Susana Muhoz-Hernandez and Jose Manuel Gomez-Perez. 

- PPSWR 2004, Workshop on Principles and Practice of Semantic Web Rea- 
soning, organized by Hans Jurgen Olrlbach and Sebastian Sclraffert. 

The traditional Prolog Programming Contest was organized by Tom 
Schrijvers and Remko Trongon. 




VI 



Preface 



We take this opportunity to thank the many people who helped in the pro- 
cessing of the submissions and in the local organization of the conference. In 
particular we are grateful to Tristan Denmat, Ludovic Langevine, Elisabeth Le- 
bret, Joohyung Lee, Lydie Mabil, Matthieu Petit, Olivier Ridoux and Benjamin 
Sigonneau. We also thank the organizers of the workshops and the Prolog Pro- 
gramming Contest. 

Finally, we thank all the authors of the submitted papers, the Program Com- 
mittee members, and the referees for their time and effort spent in the reviewing 
process, Pal Halvorsen and Ketil Lund for their conference management soft- 
ware, ConfMan, and the conference chair Mireille Ducasse and her equipe in 
Rennes for the excellent organization. 
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Termination by Abstraction 



Nachum Derslrowitz* 

School of Computer Science, Tel-Aviv University, 
Ramat Aviv, Tel-Aviv 69978, Israel 
nachum . dershowitzOcs .tau.ac.il 



Abstract. Abstraction can be used very effectively to decompose and 
simplify termination arguments. If a symbolic computation is nontermi- 
nating, then there is an infinite computation with a top redex, such that 
all redexes are immortal, but all children of redexes are mortal. This sug- 
gests applying weakly-monotonic well-founded relations in abstraction- 
based termination methods, expressed here within an abstract framework 
for term-based proofs. Lexicographic combinations of orderings may be 
used to match up with multiple levels of abstraction. 



A small number of firms 
have decided to terminate 
their independent abstraction schemes. 

- Netherlands Ministry of Spatial Planning, 
Housing and the Environment (2003) 



1 Introduction 

For as long as there have been algorithms, the question of their termination 
- though undecidable, in general - has had to be addressed. Not surprisingly, 
one of the earliest proofs of termination of a computer program was by Turing 
himself [43], mapping the program state to the ordinal numbers. 

Floyd [22] suggested using arbitrary well-founded (partial) orderings; this di- 
rection was developed further by Manna [34] . Such a termination proof typically 
involves several steps: 

1. Choose an appropriate well-founded set. 

2. Choose a set of points in each potentially infinite computation at which to 
measure progress towards termination. 

3. Establish invariant properties that always hold at those points. 

4. Choose a mapping from states to the well-founded set by which to measure 
progress. 

5. Show a necessary decrease in this measure with each transition from point 
to point. 



* Research supported in part by the Israel Science Foundation (grant no. 254/01). 



B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 1-18, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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For a survey of termination methods for ordinary programs, see [30] 1 . 

Showing termination of symbolic computations often requires special tools, 
since state transitions involve symbolic expressions that may grow bigger and 
bigger, while progress is being made towards a final result. Therefore, one of- 
ten resorts to powerful term-based orderings, such as have been developed for 
rewrite systems [13]. We are mainly interested here in relatively simple symbolic 
termination functions, mapping symbolic states to terms, and in sophisticated 
methods of showing that they decrease. More complicated symbolic transforma- 
tions have been considered, for example in [3,4]. 

We use rewriting [15, 20, 42] as a prototypical symbolic computation paradigm 
(and employ terminology and notation from [20]). A rewrite system is (uni- 
formly) terminating if there is no term to which rules in can be applied over- 
and-over-again forever; see [13]. Narrowing (a unification-based version of rewrit- 
ing) has been proposed as a basis for functional-logic programming; see [19,27]. 
Termination of narrowing has been considered in several works [28,21,6]. Much 
effort has also been devoted to devising methods for establishing termination of 
logic programs. For a survey, see [10]; a recent dissertation on the subject is [39]; 
interfaces to several automated tools (cTI, Hasta-La-Vista, TALP, TermiLog, 
and TerminWeb) are available over the web. Methods have been suggested for 
converting well-moded logic programs into rewrite systems with identical termi- 
nation behavior [2,36]. 

In the next section, we sketch how abstraction is used to decompose termi- 
nation proofs. Section 3 introduces notation and monotonicity properties, and is 
followed by a section containing some termination methods for rewriting based 
on those properties. In Section 5, we look at constricting derivations, which are 
used in the following section to design dependency-based approaches, in which 
the symbolic state is a “critical” immortal subterm. Correctness of the various 
methods and their interrelatedness are the subjects of Section 7. We conclude 
with an example. 



2 Abstraction 

A transition system is a graph in which vertices are states (S) of a computation 
and edges (~») are state-transitions, as defined by some program. A computation 

1 It is misleading to suggest (cf. [26]) that - for deterministic (or bounded- 
nondeterministic) programs - it suffices to use the natural numbers as the well- 
founded set (Step 1), claiming that - after all - the (maximum) number of iterations 
of any terminating loop is fixed and depends only on the values of the inputs. This 
fixation on the naturals begs the real issue, since the proof (Step 5) may require 
transfinite induction over ordinals much larger than w. For example, one can easily 
program the deterministic Battle of Hercules and Hydra (or Goodstein sequences) 
[31]. Though there exists an integer- valued function that counts how many steps it 
takes Hercules to eliminate any Hydra, proving that it is well-defined, and that it 
decreases with each round of the battle, provably requires a stronger principle of 
induction (viz. eo) than that provided by the Peano Axioms of arithmetic. 
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is a sequence so si • of states, where the arrows represent transitions. We 
say that a binary relation (over some S) is terminating if there are no infinite 
descending sequences si S 2 ^ • • • (of elements .s, G S). This property will be 
denoted SN(^>) for “strongly normalizing”. Thus, we aim to show SN(~»), that 
is, that the transition relation ^ is terminating, for given transition systems 
To show that no infinite computation is possible, one can make use of any other 
terminating relation ^>, and show that transitions are decreasing in ^>. That 
is, we need s ^ s' to imply s s' , or ^ C ^>, for short. 

Abstraction and dataflow analysis can be used to restrict the cases for which 
a reduction needs to be confirmed. The underlying idea is that of abstract in- 
terpretation, as introduced by Sintzoff [40], Wegbreit [45], and others, and for- 
malized by Cousot and Cousot [9]. The property we are concerned with here is 
termination. For use of abstraction in termination of logic programs, see [44]. 

A partial ordering > is well-founded if it is terminating. If > is a quasi- 
ordering (i.e. a reflexive-transitive binary relation) and < its inverse, then we 
can use ~ to denote the associated equivalence (> H <, viewing orderings as 
sets of ordered pairs) and > to denote the associated partial ordering (> \ <). 
We will say that a quasi-ordering > is well-founded whenever its strict part > 
is. We often use well-founded partial and quasi-orderings in proofs, since they 
are transitive. Specifically, we know that s > t > u and s > t > u each imply 
s > u. 

As is customary, for any binary relation S->, we use S-> + for its transitive 
closure, S->* for its reflexive-transitive closure, and S-> - (or <-P, typography per- 
mitting) for its inverse. If a relation is terminating, then both its transitive 
closure >?d~ and reflexive-transitive closure ^ are well-founded. In what follows, 
we will dedicate the symbol >»- for terminating relations, >- for well-founded 
partial orderings, and £3 for well-founded quasi- orderings. The intersection of a 
terminating relation with any other binary relation is terminating: 

SN(^) => SN(>^n>). (1) 

It is often convenient to introduce an intermediate notion in proofs of ter- 
mination, namely, a “termination function” r, mapping states to some set IT, 
and show that state transition s ^ s' implies r(s) t(s'), for some termi- 
nating relation >»-. Accordingly, one can view r(s) as an “abstraction” of state 
s for the purposes of a termination proof. Instead of proving that is termi- 
nating, one considers the abstracted states t(S ) = {r(s) | s € S} C W and 
supplies a proof of termination for the abstract transition relation r(~»), defined 
as {r(s) t(s') I s s'}. 

Suppose the only loops in the abstracted transition graph t(S) are self-loops. 
That is, r(s) r(s') r(s) implies r(s) = t(s’). Then termination can be 
decomposed into subproofs for each of the loops and for the remainder of the 
graph, sans loops. For the latter, one needs to check that r(~>) has no infinite 
chains, which is trivially true when the abstract graph is finite. For each of the 
self-loops, one needs to reason on the concrete level, but under the assumption 
that r remains invariant (its value is some constant). 
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Oftentimes [34], one maps states to a lexicographically-ordered tuple of el- 
ements, a pair (ti(s),7s(s)), say. Then one needs to show, separately (if one 
wishes), that every transition s ^ s' implies t\ (,s) £3 ^(s 7 ), for some well-founded 
quasi-ordering £3, and that s ^ s' and ti(s) ~ t\ (s') imply T2(s) >»- ^(s'), for 
some terminating relation 

In the symbolic case, the set of ground terms in a computation can be divided 
according to some set of patterns of which they are instances. If there are only 
a finite number of different patterns, and computations do not cycle among the 
patterns, then one only needs to show termination of computations involving a 
single pattern. In logic programming, these can be predicate names and argument 
modes. For rewriting, syntactic path orderings [12, 13], based on a precedence of 
function symbols, are used, but one must consider subterms, as well as the top- 
level term. Abstraction is also the essence of the “operator derivability” method 
of [21] for pruning unsatisfiable narrowing goals (as used for functional-logic 
programming), where terms /(•••) are represented by their outermost symbol /. 
A more sophisticated use of patterns to prune narrowing-based goal solving was 
developed in [6]. 

Example 1. The rewrite system 

£@z — > z £ = £ — > T £ = x:y — > F 

( x:y)@z — » x:(y@z) x:y = x:z — > y = z x:y = £ — * F, 

for appending and comparing lists, can be used to compute directly by rewriting 
(using pattern matching), or can be used to solve goals by narrowing (using uni- 
fication), or by their pleasant combination [19]: eager simplification interspersed 
between outermost narrowing steps. The goal z@(b : e) = a : b : £, for example, 
where 2 is the existential “logic” variable being solved for, narrows to the sub- 
goal b:£ = a:b:£ (applying the first rule and assigning z 1 — > £), which dies (for 
lack of applicable rule). The original goal also narrows to x: (z'@(b:£)) = a:b:£ 
via the bottom-left rule (with 2 1 — > x : z'), which narrows to z'@(b : £) = b : £ 
(1 i-t a), which narrows to b:£ = b:£ [z! 1 — > £), which simplifies to T. 

Suppose we are interested in solving goals of that form, z@A = B , where 
A and B are (fully instantiated) lists. As abstract states, we can take the goal 
patterns {A@A = B, x: (z@A) = a : B }, {A = B }, and {T, F}, where a can be 
any atom. As we just saw, a goal of the form = B can narrow to one of the 
forms A = B and x: ( 2 7 @A) = B. In the second event, if B = £, then the goal 
simplifies to F\ otherwise, it is of the form x: (^@A) = a: B. For the latter goal, 
we have x: (z@A) = a: B z@A = B. All told, the possible abstract transitions 
are 

rv -AA 

{z@A = B, x:(z@A) =a:B} {A = B} 

\ / 

{T, F} 

Since there are self-loops for the top two pattern sets, a proof of termination 
only requires showing that with each trip around these loops some measure 
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decreases. For the right loop, we measure a goal A = B by the list B under the 
sublist (or list length) ordering. For the left loop, we first look at its pattern 
[with = B <C x: (z@A) = a\B\, and, for like patterns, take B. □ 

In general, consider an infinite computation so Si and let r assign 

one of finitely many colors to each state. By the Pigeonhole Principle, infinitely 
many states must share the same color. Hence, there must be a subcomputation 
Si Sj ( i < j) for which r(sj) = t(sj). So if we show the impossibility of this 
happening infinitely often for any color, then we have precluded having infinite 
computations altogether. 

Rather than just coloring vertices (states) of the transition graph, it is even 
better to also color its edges and paths: each subcomputation s* Sj (i < j) is 
assigned one of a finite palette A of colors. Then, by (a simple case of) the infinite 
version of Ramsey’s Theorem, there is an infinite (noncontiguous) subsequence 
of states ,s n Si 2 '^■ + • • •, such that every one of its subcomputations s* . 

Sj k has the identical color a £ A [and also r(s^) = r(s, fe )]. So, to preclude 
nontermination, we need only show every such cyclical subcomputation decreases 
in some well-founded sense. 

As shown in [16, Lemma 3.1], this fact can be applied to the call tree of a 
logic program. (See also the discussion in [7].) This leads to the query-mapping 
method of Sagiv [38, 16] and to similar techniques [8,33]. 



3 Formalism 

Let F be some vocabulary of (ranked) function symbols, and T the set of 
(ground) terms built from them. A flat context £ is a term of the form f{t \, . . . , 
ti_i, □, U+i, . . . , t n ), where f € F is a function symbol of arity n > 0, the ti 
are any terms, and □ is a special symbol denoting a “hole”. If i is such a flat 
context and t a term (or context), then by £\t\ we denote the term (or con- 
text) f(ti, . . . . . ,t n ). We will view £ also as the binary relation 

{(t,£[t]) 1 1 G T}, mapping a term t to the term £[t\, containing t as its immedi- 
ate subterm. The inverse of flat £, with its hole at position i, is the projection 
7Tj. Let L be the set of all flat contexts (for some vocabulary), and n = L~ , the 
set of all projections. 

A context c is just an element of L*, that is, the relation between any term 
t and some particular superterm c[t] containing t where c’s hole was. It has the 
shape of a “teepee” , a term minus a subterm, so may be represented by a term 
c[m] with one hole. Let C = L* C T x T denote all contexts; put another way, 
C is just the subterm relation <. Its inverse > is the superterm relation and its 
strict part > is proper superterm. 

A rewrite system is a set of rewrite rules, each of the form l — > r, where l and 
r are (first-order) terms. Rules are used to compute by replacing a subterm of 
a term t that matches the left-side pattern l with the corresponding instance of 
the right side r. For a rewrite system R , viewed as a binary relation (set of pairs 
of terms), we will use the notation ocr to signify all its ground instances (a set 
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of pairs of ground terms), and — for the associated rewrite relation (also on 
ground terms). The latter is the relevant transition relation. 

Composition of relations will be indicated by juxtaposition. If S and R are 
binary relations on terms, then by S[R] we denote the composite relation: 

S[R] = x~ Rx , 

which takes a backwards 5-step before R , and then undoes that 5-step. Let T be 
the set of all ground instantiations, where a ground instantiation 7 is the relation 
(t, ty), where fy is the term t with its variables instantiated as per 7. The inverse 
operation y^ 1 is “generalization”, which replaces subterms by variables. With 
this machinery in place, the top-rewrite relation (rule application) and rewrite 
steps (applying a rule at a subterm) are definable as follows: 

ocr = r[R] 

* R = CfcXfi] . 

Thus, 

ocr = {I'y oc ry | ? — > r £ R, yEf} 

—>r = {c[l 7] — ► c[ry] | l —> r £ R, 7 G r, c € C} . 

Of course, 



OCR C — . 



(2) 



Since we will rarely talk about more than one system at a time, we will often 
forget subscripts. 

Two properties of relations are central to the termination tools we describe: 



Mono(zi): □ £ C i □ 
Harmony(l],^>): □ £ C i 



where □ and are arbitrary binary relations over terms and l is an arbitrary 
flat context. See the diagrams in Fig. 1. Mono is “monotonicity”, a.k.a. the 
“replacement property” (relations are inherited by superterms). Rewriting is 
monotonic: 



Mono(- 



(3) 



Harmony means that 

sut => £[s\ > £[t] 

for all £ £ L and s,t GT 2 . So, monotonicity of a relation is self-lrarmony: 

Harmony(^>, ^>) <t=> Mono(») . (4) 

2 Harmony is called “quasi- monotonicity of with respect to □” in [5]. 
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t 



□ 




l 



Fig. 1 . Monotonicity and harmony. 



Clearly: 



Mono(^>) A Mono(n) Mono(^> fl □) (5) 

Mono(^>) Mono(— > fl ^>) (6) 

Mono(^> n □) => Harmony(^> fl □, 3>) (7) 

Harmony^, ^>) A Harmony^, □) Harmony( c H, fl □) , (8) 



for any relations ^>, □, S->. 

All such relations refer to ground terms. They may be lifted to free terms 
in the standard manner: Demanding that u v, for terms u and v with free 
variables, means that uy vy for all substitutions 7 of ground terms for those 
variables. 

Let — » be a rewrite relation, the termination of which is in question and oc, its 
rule application relation. To prove SN(— >), we make use of various combinations 
of conditions involving two basic properties: 



Rule(n): oc C □ 

Reduce(zi): — » C □ 



The following relations are all easy: 

Rule(oc) (9) Rule(>) ARule(M) <^>Rule(»n D) (12) 

R u l e (_>) (10) Reduce (») A Reduce (□) <=> Reduce (» fl □) (13) 

Reduce (— >) (11) Rule (>) A Mono(-> n >) Reduce (>) (14) 

Rule(^>) A Harmony(— > n ^>) <f=> Reduce (») (15) 

Reduce(^>) A Harmony(^>, □) => Mono(— » fl □) (16) 

Statements (14,15) are by induction on term structure. 

As described in Section 2, one can (always) prove termination by showing 
that — > is contained in some terminating relation >K. Accordingly, the first, and 
most general, method employed in termination arguments is 3 : 



3 These two methods are usually phrased in terms of well-founded orderings, rather 
than terminating binary relations. 
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Obvious (>»-): Reduce (»-). 

More precisely, this means 

SN(>k) A Reduce(>^) => SN(— >) , 

where SN(>^) makes explicit the assumption that >K is terminating. 

Since Reduce refers to the “global” rewriting of any term at any redex, it is 
easier to deal with Rule, which is a “local” condition on rules only, and impose 
monotonicity on the relation : 3 

Standard (>»-) [32]: Rule(»-), Mono(>K). 



4 Harmonious Methods 

The following properties can be used to show that the union of two relations is 
terminating: 



Commute ( □ ,^>) : □> C >+ □ * 

Compat(U,>): □ » C □ 



For example [3]: 

Commute (»-, »-') A SN(»~) A SN(^') => SN(>»- U »-') . 

Obviously, if > is the strict part of a quasi-order >, then: 

Compat(>, >) . (17) 

Requiring that the relation be monotonic, as in the Standard method of 
the previous section, may be too restrictive; all that is actually needed is that it 
be harmonious with rewriting: 

Kamin & Levy (»-) [29]: Rule(>»-), Harmony(— > n »-,!*-). 

When terms are larger than their subterms, monotonicity can be weakened 
to refer instead to a non-strict quasi-ordering £3 (>- is its strict part) 4 : 

Quasi-Simplification Ordering (£;) [12]: Rule(>-), Sub(£), Mono(^). 

where a binary relation has the subterm property (Sub) if it contains the super- 
term relation (>): 



Sub(n): > C □ 



4 We are ignoring the fact that the subterm and monotonicity conditions for quasi- 
simplification orderings obviate the separate need to show that the ordering is well- 
founded [12]. 
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By definition: 



Sub(>) . (18) 

As illustrated in [1], the fact that the Quasi-Simplification Ordering 
method, as well as the ones developed in Section 6 below, do not require Mono(^) 
means that selected function symbols and argument positions can be ignored 
completely (cf. the use of weak monotonicity in [14]). See the example in Sec- 
tion 8. 

As before, what is actually needed is that the relation ^3 be monotonic when 
restricted to pairs that are related by rewriting: 

Subterm (£) [13]: Rule(>~), Sub(^), Harmony(— > n £,£). 

Furthermore, the proof of this method in [13] is based on the requirements: 

Right (£): Right (>-), Harmony(— > D £,£). 

Here, we are using the property 



Right (□): oc > C □ 



meaning that left-hand sides are bigger than all right-side subterms. The com- 
posite relation oc > comprises the “dependency pairs” of [1]. Trivially: 

Right (») => Rule(») , (19) 

In the following formulations, >^- is terminating, but need not be. If one 
relation is monotonic, then the other should live harmoniously with it: 

Harmony (»-,»): Rule(»- n ^>), Mono(^>), Harmony(>- n »,»-). 

The semantic path ordering of [29] (see [13]) is a special case, using — > for 3>, for 
which only the conditions of the Kamin & Levy method need be shown (see 
Lemma 5 below). 

Monotonicity (»-,»): Rule(»- IT »), Mono(^>), Harmony(^>,>^). 

The monotonic semantic path ordering of [5] uses a semantic path ordering for 
!*-, demanding Rule(^>* T >-) and Harmony(^>, ^>* T £3), in the final analysis. 

The correctness of these methods is proved in Section 7 below. A more com- 
plicated alternative is 

Weak (>^,S>): Right (»-), Harmony(oc T »-,S>), Harmony(— > T »,»), 
Commute)^, »-). 



5 Constrictions 

The goal we haven been pursuing is to establish finiteness of sequences of transi- 
tions, beginning in any valid state. It will be convenient to define the set (monadic 





10 



Nachum Dershowitz 



predicate) 00 of elements that can initiate infinite chains in a relation as 
follows: 

'^°° = {so | 3 si, s 2) Vj. Sj Sj+i} . 

Thus, is the set of “immortal” initial states. With this notation in mind, 
termination of a transition system, SN(^), is emptiness of (that is, denial 
of immortality): 



SN(>) <^> >°°=0. 

For rewriting, since contexts and rewrites commute ([> — ; ► C — > >), meaning that 
if a subterm can be rewritten, so can the whole term, we have [15]: 

-+°° = (->U>)°°. (20) 

Two important observations on nontermination of rewriting can be made: 

— If a system is nonterminating, then there is an infinite derivation with at least 
one redex at the top of a term. In fact, any immortal term has a subterm 
initiating such a derivation: 

-»°° C > — >*oc — >°° . (21) 

See, for example, [11], [12, p. 287]. 

— If a system is nonterminating, then there is an infinite derivation in which 
all proper subterms of every redex are mortal. By mortal , we mean that 
it initiates finite derivations only. Let’s call such redexes critical. Rewriting 
at critical redexes yields a “constricting” derivation in the sense of Plaisted 

[37]. 

For given rewrite relation — >, let be its immortal terms (T^ =— >°°), T <oc 

the mortal ones (T \ Too), and T 0 = T ^ fl L[T <00 \ the critical terms (immortal 
terms all of whose subterms are mortal). To facilitate composition, it will be 
convenient to associate a binary relation PI with monadic predicates P: 

PI = {(a:, x) | x e P}, 

the identity relation restricted to the domain of P. Let oc 0 = T 0 ? oc be a con- 
stricting rewrite step (at a critical redex) and — > 0 = Cfoco] be the corresponding 
rewrite relation. The following facts hold: 

- Too? C Too? (22) 

>Too? C Too? (23) 

T< 00? C T< 00? • (24) 

In words: mortals remain mortal after rewriting; mortals beget mortal subterms; 
immortals remain immortal after constriction. 

Let a non-top constriction be denoted -^b— II[— > 0 ]. Let — »£> be a top con- 
striction, followed by a sequence of projections, followed by a sequence of non-top 
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constrictions: — >d — oc 0 U -»g. Considering constrictions suffices for termination 

[37] 5 : 

y — ““ > o — tl > B > D ■ \ zo ) 

Thus, we aim to show only that — is terminating. To prove this one can use 
compatible well-founded orderings A and A' such that oc 0 > C >- and -tflC/'. 
This is the basis for the various dependency-pair methods 6 . Since constricting 
redexes don’t have immortal children, termination follows even if the condition 
oc 0 > C a is weakened to oc 0 t> C (>- U >) 7 . 

Therefore, we can restrict the following two properties of rewrite sequences 
to refer only to constrictions: 



Subrule (□): 


<Xo C □ U > 


Depend (□): 


OCo > c □ U > 



where: 



Depend (») =>• Subrule (^>) (26) 

Rule(n) => Subrule (□) (27) 

Subrule(^>) A Sub(n) A Compat(^>, □) => Depend(») (28) 

Subrule (a) A Sub(A) => Depend(A) . (29) 

Statement (29) follows from (17,28). 

All this establishes the correctness of the following method: 

Basic (£): Depend) A), Reduce) A). 



6 Dependency Methods 

In what follows, let A and A' be arbitrary well-founded quasi-orderings, and A 
and a' their associated strict well-founded partial orderings. 

The dependency-pair method [1] of proving termination of rewriting takes 
into account the possible transitions from one critical term to the next (—> d ) in 
an infinite rewriting derivation. Using the notations of the previous sections, we 
have two additional variations on this theme: 

5 The idea is reminiscent of Tait’s reducibility predicate [41]. Constricting derivations 
were also used by [24] to argue about the sufficiency of “forward closures” for proving 
termination of “overlaying” systems (see [13]). 

6 Another way to understand the dependency method is to transform ordinary rewrite 
rules into equational Horn clauses (i.e. conditional rewrite rules; see [42, Sect. 3.5]). 
A rule l —> f(r i, . . . ,r n ) has the same operational behavior as the flat, conditional 
rule n — >* yi,...,r n — >* y n : l — + f{yi, ■ ■ ■ ,Vn)- Using the decreasing method [18] 
for operational termination requires that l A ri, ... ,r n , f(yi, ■ ■ ■ , yn) for all y t such 
that ri — >* j/i . 

' For proving “call-by-value” termination, or for (locally-confluent overlay) rewrite 
systems for which innermost derivations are the least likely to terminate, the condi- 
tions can be simplified, since — steps cannot preceded a — >.d step. See [1]. 
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Main (>3) [1]: Depend(^), Mono(— > fl £). 

Intermediate (<3, >-'): Depend (a'), Reduce ( >3), Compatf)-',^). 

More specific techniques may be derived from these. For example, the depen- 
dency method of [20] may be expressed as follows 8 : 

Harmonious Dependency ([3, (3') [20]: Depend(V'), Rule ([3), Mono(<3), 

Harmony (>3, ^3'). 

Let F be a mirror image of F: F = {g\ g £ F} . Denote by s the term 
s = /(mi, • • • , u n ) with root symbol f £ F replaced by its mirror image f £ F, 
that is, s = f(u ±, . . . , u n ). Let T be T’s image under'/ If >- is a partial ordering 
of T, define another partial ordering >- as u v, for terms u,v £ T, when u >- v . 
The original dependency-pair method is approximately 8 : 

Dependency Pairs ([3) [1]: Depend(P), Rule ([3], Mono ([3). 

Here, Mono applies to both hatted (/ £ F) and bareheaded (/ £ F) terms, 
hence implies Harmony. 

A more recent version of the dependency-pair method is essentially: 
Variant (^,^0 [25]: Depend(^'), Rule(<3), Mono(^), Compat(^ / , <3). 



7 Method Dependencies 

Entailments between the methods are depicted in Fig. 2. The following series of 
lemmata justify the figure, by establishing dependencies between the different 
methods and their correctness. As a starter, take: 

Lemma 1 ([32]). Obvious (»-) => Standard (»-) . 

In general, such an implication M => M' means that method M' is a spe- 
cial case of method M. To prove the implication, viz. that correctness of the 
antecedent method M implies correctness of the consequent M' , one shows that 
the requirements for M' imply the requirements for M. This includes the re- 
quirement that any terminating relation(s) or well-founded ordering(s) used by 
M should be a derivative of those used by M'. 

Suppose method M(»-) has requirements C and M'(^A) requires C' . Then, 
to claim M => M' one needs to establish 

C" A SNM A -.SN(-v) => <7ASN(»-). 

In particular, to prove Lemma 1, we show that the conditions for the latter 
imply the conditions for the former: 

8 The dependency-pair methods of [1, 20, 25] exclude only variables u in r, rather than 
all left-side proper subterms, from the requirement that l u of Depend (;A) or 
l>~ u of Depend (>-). This can make a practical difference [35]. 
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Harmonious Dependency 
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Dependency Pairs 
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fN 

Monotonicity 4=3 



Standard 
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Harmony 
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Kamin & Levy 
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Right 
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4= Main 
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JJ-9 

Basic 

JJ-11 

Intermediate 

JJ-12 

Variant 



=r- Subterm 

15 

JJ-8 

Quasi-Simpliflcation 

Ordering 



Fig. 2. Dependencies of methods. (Numbers refer to lemmata.) 



Proof. By (6,14), 

SN(>^) A Rule(>^) A Mono(^) => SN(>^) A Reduce))^) . □ 

Lemma 2. Standard (»- n ^>) => Harmony (>^,^>). 

Here, the implication means that correctness of Harmony, using the terminat- 
ing relation >»-, follows from the Standard method, using the restricted relation 
— » (~l >»- (which is also terminating, by Eq. 1). 

Lemma 3. Harmony => Monotonicity (>^,^>). 

This circle of dependencies can be closed: 

Lemma 4. Monotonicity (»-,—>) => Obvious (>»-). 

The correctness of Obvious using the ordering follows from the Mono- 
tonicity method - using the monotonic rewrite relation — » for ^>. 

Lemma 5. Harmony (>»-,—>) => Kamin & Levy (>»-). 

Proof. We need 

Rule(»-) A Harmony)—* n >¥-) 

=> Rule)—* IT >^-) A Harmony)—* D >»-) A Mono)— *) , 

which follows from (10,3,12). □ 

We have split the argument of [12] for the Quasi-Simplification Ordering 
method into three parts, with the Right and Subterm methods as intermediate 
stages: 
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Lemma 6. Kamin &: Levy (a w ) => Right (£), where s A w t is the well- 
founded multiset extension [17] of A to the bags of all the subterms of s and t. 

Proof. We need to show that 

Right (>-) A Harmony(— > fl £, £) =>• Rule(A w ) A Mono(— > fl >-“) . 

For u ocv, Right (a) means that a bag containing u alone is strictly greater than 
a bag with all of it’s subterms. So, by the nature of the bag ordering (adding to 
a bag makes it bigger), Rule(A^) follows. If u A" w, one only needs to know 
that £[u] P £[v\ for t\u\ A“ £[v\ to hold, which we have thanks to Harmony (and 
Eqs. 15, 19), as long as u — > v. □ 

Lemma 7. Right (£) =4- Subterm (£). 

Proof. To see that 

Rule (a) A Sub(£) A Harmony (— * fl £, £) =>■ Right (a) A Harmony(— » fl P, £) , 

note that Right (a) follows from Rule ( A), Sub(A), and the compatibility of a 
quasi-ordering with its strict part (17). □ 

Lemma 8. Subterm (£) => Quasi-Simplification Ordering (£). 

Proof. Harmony(— » fl 'Zj'Z) follows from Mono(^) and (3,5,7). □ 

Turning to the dependency methods: 

Lemma 9 ([1]). Main (£) => Basic (£). 

Proof. Follows from (14). □ 

Lemma 10. Basic (£) =t> Main (£), for constrictions — > 0 . 

Proof. From Subrule(^) and Mono(— > 0 fl £), one can show R.educe(A) for con- 
strictions. □ 

Lemma 11. Basic (£*) =>• Intermediate (A;, A 7 ), where P* symbolizes the 
transitive closure of P U A 7 . 

Proof. We need 

Depend(A 7 ) A Reduce(^) A Compat(A / , A;) => Depend(A *) A Reduce^*) . 

Note that A* is well-founded on account of Compat(A 7 , A). The rest is straight- 
forward. □ 

Lemma 12. Intermediate (A, A 7 ) =>■ Variant (A, A 7 ). 

Proof. By (6,14). □ 

Lemma 13. Main (A 7 ) => Harmonious Dependency (A, A 7 ). 

Proof. By (14,16). 



□ 




Termination by Abstraction 



15 



Lemma 14. Harmonious Dependency (£,£;) => Dependency Pairs (£). 
Proof. Harmony^, ^ 3 ) holds trivially. □ 

The linchpin step is: 

Lemma 15. Main (£;) =>■ Subterm (( 3 ). 

Proof. This follows from (27,29). □ 



8 Illustrations 



We conclude with an example. 

Example 2. Consider the seven rules: 

s(x + y) 
x-y 

y + (xxy) 
xx(y + z) + (y + z) 



0 + x 
0 — x 

Oxi 



x 

0 

0 



s{x) + y 
s(x) - s(y) 
s(x) x y 
s(x) x (y + z) 



To prove termination by the Harmonious Dependency method, we can use 
the style of the general path ordering [14, 23], which allows one to compare terms 
by comparing a mix of precedences, interpretations, and selected arguments. 
Take a “natural” interpretation [•] to show that s — » t preserves the value of 
the interpretation (this natural equivalence of value will be ( 3)5 and for >-' use a 
termination function based on an interpretation {[■}}, where: 



[0] 

[s(aO] 


= 0 

= N + 1 


{{s(o:)S = 


(s, 0 ,0) 


Ix + yj 


= H + [y] 


+ yj = 


<+, hi, N) 


\x-y\ 


= H -min([a:],Iy]) 


lx - yj = 


<-> W,o) 


h X y] 


= lx] • M 


x y } = 


< x , jfar] , 0) , 



with triples ordered lexicographically. The precedence x > + > — > s is the 
abstraction 9 . Terms in the same abstract class are compared by the remaining 
components, which express the recursion scheme. Harmony follows from the 
use of [•] in {{ ■ }} . Now, one shows the following inequalities for constricting 
transitions: 



{{ s 0) + yj > IX® + 2 /)}}, {a; + 

{{ s 0) - s(j/)} > |® - y S 

IX®) x y] S- > {{y + (x x y)}, {{;r x yj 
{{s(®) x (y + z)}} > {{.t x (y + z) + (y + 2 )}} , •§> x (y + 2 )}} . □ 

9 There is no need to explicitly exclude the case that u is headed by a constructor (as 
done in [1]). One simply makes terms headed by constructors smaller than terms 
headed by defined symbols, which is why s is minimal. The constructor 0 need not 
be interpreted, since it appears on the left of every rule in which it appears at all. 
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Rather than a simple precedence, one can devise a “pattern-based” ordering. 
Patterns that can never have a top redex are made minimal in the surface order- 
ing, and safely ignored. Symbolic inductive techniques may be used to discover 
patterns that generalize terms in computations. 
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1 Introduction 



Answer set programming (ASP) (see, for instance, [22]) is a new declarative 
programming paradigm suitable for solving a large range of problems related to 
knowledge representation and search. The paradigm is rooted in recent develop- 
ments in several areas of artificial intelligence. ASP starts by encoding relevant 
domain knowledge as a (possibly disjunctive) logic program, 77. The connectives 
of this program are normally understood in accordance with the answer set (sta- 
ble model) semantics [12,13]. The corresponding language is frequently referred 
to as A-Prolog (or ANS-Prolog). The language’s ability to express defaults, i.e. 
statements of the form “normally, objects of class C have property P”, coupled 
with its natural treatment of recursion, and other useful features, often leads 
to a comparatively concise and clear representation of knowledge. Insights on 
the nature of causality and its relationship with the answer sets of logic pro- 
grams [14,21,25] allows for the description of the effects of actions which solves 
the frame, ramification, and qualification problems, which for a long time have 
caused difficulties in modeling knowledge about dynamic domains. 

In the second stage of the ASP programming process, a programming task is 
reduced to finding the answer sets of a logic program 77 UP where P is normally a 
simple and short program corresponding to this task. The answer sets are found 
with the help of programming systems [23,9,10,16,20] implementing various 
answer set finding algorithms. 

During the last few years the answer set programming paradigm seems to 
have crossed the boundaries of artificial intelligence and has started to attract 
people in various areas of computer science. The recent book [6] contains the 
first comprehensive introduction to the subject. 

In this talk I will discuss the use of ASP for the design and implementation of 
software components of deliberative agents capable of reasoning, planning and 
acting in a changing environment. We assume that such an agent has knowledge 
about its domain and about its own capabilities and goals. It constantly 



B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 19-26, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




20 



Michael Gelfond 



1. observes the world, explains the observations, and updates its knowledge 
base; 

2. selects an appropriate goal, G; 

3. looks for a plan (sequence of actions ai, — , a n ) to achieve G; 

4. executes some initial part of the plan, updates the knowledge base, and goes 
back to step (1). 

Initially we will assume [8] that 

— The agent’s environment can be viewed as a transition diagram whose states 
are sets of fluents 1 and whose arcs are labeled by actions. 

— The agent is capable of making correct observations, performing actions, and 
remembering the domain history. 

— Normally the agent is capable of observing all relevant exogenous events 
occurring in its environment. 

The ASP methodology of design and implementation of such agents consists in 

— Using A-Prolog for the description of a transition diagram representing pos- 
sible trajectories of the domain, history of the agent’s observations, actions, 
occurrences of exogenous events, agent’s goals and information about pre- 
ferred or most promising actions needed to achieve these goals, etc. 

— Reducing the reasoning tasks of an agent (including planning and diagnos- 
tics) to finding answer sets of programs containing this knowledge. 

In this talk I illustrate the basic idea of this approach by discussing its use 
for the development of the US A- Ad visor decision support system for the Space 
Shuttle. The largest part of this work was done by my former and current stu- 
dents Dr. Monica Nogueira, Marcello Balduccini, and Dr. Richard Watson, in 
close cooperation with Dr. Matt Barry from the USA Advanced Technology De- 
velopment Group [1, 2, 5, 24]. From the standpoint of engineering, the goal of our 
project was to design a system to help flight controllers plan for correct opera- 
tions of the shuttle in situations where multiple failures have occurred. While the 
methods used in this work are general enough to model any of the subsystems of 
the shuttle, for our initial prototypes we modeled the Reaction Control System 
(RCS). The project consisted of two largely independent parts: modeling of the 
RCS, and the development of a planner for the RCS domain. 

2 Modeling the RCS 

The RCS is a system for maneuvering the spacecraft which consists of fuel and 
oxidizer tanks, valves, and other plumbing needed to deliver propellant to the 
maneuvering jets, electrical circuitry to control the valves, and to prepare the 
jets to receive firing commands. The RCS is controlled by flipping switches and 
issuing computer commands to open and close the corresponding valves. We 

1 Fluent is a propositions whose truth values may change as a result of actions. 
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are interested in a situation when a shuttle controller has the description of the 
RCS, and a collection of faults reported to him by the observers, e.g. switch 5 
and valve 8 are stuck in the open position, circuit 7 is malfunctioning, valve 3 
is leaking, etc. The controller’s goal is to find a sequence of actions (a plan), 
to perform a desired maneuver. Controller may use the US A- Ad visor to test if 
the plan he came up with manually, actually satisfies the goal, and/or find a 
plan to achieve this goal. The USA-Advisor consists of a collection of largely 
independent modules, represented by A-Prolog programs. A Java interface asks 
the user about the history of the RCS, its faults, and the task to be performed. 
Based on this information, the interface selects an appropriate combination of 
modules and assembles them into an A-Prolog program. The program is passed 
as an input to a reasoning system for computing stable models (SMODELS). 
The desired plans are extracted from these models by the Java interface. 

In its simplest form, the RCS can be viewed as a directed graph, with nodes 
corresponding to tanks, pipe junctions, and jets, and links labeled by valves, 
together with a collection of switches controlling the positions of valves. 



Swl 




Sw2 



This information can be encoded by facts describing objects of the domain and 
their connections: 

tank_of (tank,fwd_rcs) . 
jet_of (jet ,fwd_rcs) . 
link (tank, junc2,v3) . 
controls(sw3,v3) . 

A state of the RCS is given by fluents, including: 
pressurizedjby(N,Tk) - node N is pressurized by a tank Tk 
instate(V, P) - valve V is in valve position P 
instate{Sw 1 P) - switch Sw is in switch position P 
A typical action is: 

flip(Sw, P ) - flip switch Sw to position P 

Our description of the corresponding transition diagram is based on McCain- 
Turner style theories of action and change [21, 18] and a number of results es- 
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tablishing the relationship between these theories and logic programs [14, 25]. In 
this approach the direct effects of actions are specified by dynamic causal laws: 

holds(f,T + 1) holds(jp,T), occur s(a,T) 

static causal laws: 

holds(f,T) holds(p,T) 
and impossibility conditions: 



:- occur s(a,T),holds(p,T). 

Below are several examples of such laws in the context of the RCS: 

Direct effect of flip(Sw, P) on instate: 

holds(in_state(Sw,P) ,T+1) :- 

occurs (flip(Sw,P) ,T) , 
not stuck (Sw) . 

Indirect effects of flip(Sw, P) on fluents instate and pressurizedJby are 
given by the following static causal laws 

holds(in_state(V,P) ,T) 

controls(Sw,V) , 
holds(in_state(Sw,P) ,T) , 
not stuck(V) , 
not bad_circuitry (V) . 
holds(pressurized_by(Tk,Tk) ,T) :- 
tank(Tk) . 

holds (pressurized_by (N1 ,Tk) ,T) :- 
link(N2,Nl,V) , 
holds(in_state(V,open) ,T) , 
holds (pressurized_by(N2,Tk) ,T) . 

Note that the last law is recursive and that the effect of a single action propagates 
and affects several fluents. 

Similar simple laws (including a simple theory of electrical circuits) consti- 
tutes a commonsense theory, EMD , of simple electro-mechanical devices. Pro- 
gram T consisting of EMD together with a collection, S, of atoms describing 
the RCS (which can be automatically generated from the RCS schematics) in- 
corporates the controller’s knowledge about the RCS. 

3 The USA Planning Module 

The structure of the USA Planning Module described in this section follows the 
generate and test approach from [11,17]. The following rules, PM, form the 
heart of the planner. The first rule states that, for each moment of time from 
[0,n) if the goal has not been reached for one of the RCS subsystems, then an 
action should occur at that time. 
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l-[occurs (A,T) : action_of (A ,R) }1 T < n, 

system(R) , 
not goal(T,R). 

The second rule states that the overall goal has been reached if the goal has been 
reached on each subsystem at some time. 

goal goal (T1 , lef t_rcs) , 
goal(T2,right_rcs) , 
goal(T3,fwd_rcs) . 
not goal. 

Finally, the last rule states that failure to achieve a goal is not an option. Based 
on results from [25] one can show that, given a collection of faults, /, and a goal 
G, finding a plan for G of max length n can be reduced to finding an answer set 
of the program 

T U I U PM 

Since the RCS contains more than 200 actions, with rather complex effects, this 
standard approach needs to be substantially improved to provide the necessary 
efficiency. 

To improve the efficiency of the planner, and the quality of plans, we will 
expand our planning module with control knowledge from the operating man- 
ual for system controllers. For instance, we include a statement “Normal valve, 
connecting node iVl with node N 2, shall not be open if N1 is not pressurized” 
which can be expressed in A-Prolog as follows: 

holds (pressurized(N) ,T) 

node(N), tank(Tk), 
holds(pressurized_by(N,Tk) ,T) . 
link(Nl,N2,V) , 
not stuck (V), 

holds (in_state(V, open) ,T) , 
not holds (pressurized(Nl) ,T) . 

After the basic planning module is expanded by about twenty such rules, plan- 
ning becomes efficient, and the quality of the returned plans improves dramat- 
ically. We ran thousands of experiments in which faults were generated at ran- 
dom, and the planner was run in a loop with n ranging from 0 to the maximal 
possible length (easily available for the RCS system) . In most cases the answers 
were found in seconds. We always were able to find plans or discover that the 
goal cannot be achieved in less than 20 minutes - the time limit given to us by 
the USA controllers. 

It may be instructive to do some comparison of ASP planning with more 
traditional approaches. 

— The ability to specify causal relations between fluents is crucial for the suc- 
cess of our project. ‘Classical’ planning languages cannot express such rela- 
tions. 
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— We do not use any specialized planning algorithm. Finding a plan is reduced 
to finding an answer set of a program. This is of course similar to satisfiabil- 
ity planning [15]. In fact recent results [20] show that in many cases finding 
a stable model of a program can be reduced to finding classical models of 
its “propositional translation”. However in [19] the authors show that any 
equivalent transformation from logic programs to propositional formula in- 
volves a significant increase in size. This confirms our experience that the 
ability of A-Prolog to represent the domain’s transition diagram, its con- 
trol knowledge, the complex initial situations, allows to naturally formalize 
domains for which propositional formalizations are far from obvious. 

— The program is fully declarative. This means that if the causal laws correctly 
describe the system’s behavior then the plans returned by our system are 
provenly correct. This also simplifies informal validation of our model and 
allows for simple modification and reuse. 

— One can move from sequential to parallel planning by simply changing a 
constant. 

— Finally, recent work on extensions of A-Prolog allowed us to declaratively 
specify preferences between plans. E.g. the system returns plans which use 
computer commands only if it is unavoidable, etc. (See [5]). 

4 Future Work 

The discussion above shows that ASP planning works well in domains where 
actions have complex effects depending on a large body of knowledge, and where 
plans are comparatively short. If we are interested in long plans, or in plans 
which require non-trivial computation, current answer set solvers do not work 
well. This is due to the fact that all the solvers start their work by grounding 
the corresponding program. Even though the grounding algorithms are smart 
and economical and the current solvers can deal with hundreds of thousands 
of ground rules this is still a very serious limitation. Finding new algorithms 
with some kind of “grounding on demand” mechanism is a very interesting and 
important research problem. 

Now a few words about other reasoning steps in the agent observe-think-act 
loop. In [3] we showed how the process of diagnostics (finding an explanation 
for an unexpected observation) can be reduced to finding answer sets of a logic 
program. In the context of the RCS, this program consists of the controller’s 
knowledge T, observations on the values of fluents, and a diagnostics module DM 
similar to PM. The program efficiently finds collections of faults (a stuck valve, a 
malfunctioning electrical circuit, etc) which explain the unexpected observation. 
It is important to notice that the same knowledge, T, is used for diagnostics 
and for planning. In other knowledge representation languages this is usually 
not the case - each task requires its own representation. It is important to make 
sure that the system has enough information to first return diagnoses which are 
most plausible or most suitable for testing, etc. To achieve this goal we need to 
better understand how such information can be expressed in A-Prolog and its 
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extensions. Some of the ideas related to this task are discussed in [4, 7] but much 

more work is needed to make these ideas practical. 
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Abstract. We show how to transform a set of regular type definitions 
into a finite pre-interpretation for a logic program. The derived pre- 
interpretation forms the basis for an abstract interpretation. The core of 
the transformation is a determinization procedure for non-deterministic 
finite tree automata. This approach provides a flexible and practical way 
of building program-specific analysis domains. We argue that the con- 
structed domains are condensing: thus goal-independent analysis over 
the constructed domains loses no precision compared to goal-dependent 
analysis. We also show how instantiation modes such as ground, variable 
and non-variable can be expressed as regular types and hence integrated 
with other regular types. We highlight applications in binding time anal- 
ysis for offline partial evaluation and infinite-state model checking. Ex- 
perimental results and a discussion of complexity are included. 



1 Background 

There is a well-established connection between regular types and finite tree au- 
tomata (FTAs) [1], although typical regular type definition notations [2,3] usu- 
ally correspond to top-down deterministic FTAs, which are less expressive than 
FTAs in general. We show how to build an analysis domain from any FTA on 
a given program’s signature, by transforming it to a pre-interpretation for the 
signature. The main contribution of this paper is thus to link the rich descriptive 
framework of arbitrary FTAs, which includes modes and regular types, to the 
analysis framework based on pre-interpretations, and demonstrate the practical- 
ity of the link and the precision of the resulting analyses. 

In Section 2, we introduce the essential concepts from types and FTAs [4] and 
a review of the approach to logic program analysis based on pre-interpretations 
[5-7]. In Section 3 it is shown how to determinize a given FTA on order to con- 
struct a pre-interpretation. Section 4 contains some examples. Implementation 
and complexity issues are discussed in Section 5. 

An informal example is given first, to give some intuition into the procedure. 

Example 1. Given a program containing functions [], [_|_] and unary function /, 
let the type list be defined by list = []; \any\list] and the type any as any = 

* Work supported in part by European Framework 5 Project ASAP (IST-2001-38059), 
and the IT-University of Copenhagen. 
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f(any); []; [any\any\. Clearly list and any are not disjoint: any includes list. We 
can determinize the types, yielding two types list and nonlist ( l and nl ) with 
type rules l = []; [nl\l]; [Z|Z] and nl = [nZ|nZ]; [Z|nZ]; /(Z); f{nl). Every ground term 
in the language of the program is in exactly one of the two types. 

Computing a model of the usual append program over the elements Z and 
nl (using a procedure to be described) we will obtain the following “abstract 
model” of the relation append/ 3: {append(l,l,l),append(l,nl,nl)}. This can be 
interpreted as showing that for any correct answer to append/ X , Y, Z), X is a list, 
and Y and Z are of the same type ( list or nonlist ) . Note that it is not possible 
to express the same information using only list and any, since whenever any is 
associated with an argument position, its subtype list is automatically associated 
with that argument too. Hence a description of the model of the program using 
the original types could be no more precise than {append(list,any,any)}. 

2 Preliminaries 

Let E be a set of function symbols. Each function symbol in E has a rank 
(arity) which is a natural number. Whenever we write an expression such as 
f(ti, . . . ,t n ), we assume that f £ E and has arity n. We write f n to indicate 
that function symbol / has arity n. If the arity of / is 0 we write the term /() 
as / and call / a constant. The set of ground terms (or trees ) Term s associated 
with E is the least set containing the constants and all terms f(t\, . . . , t n ) such 
that t\, . . . , t n are elements of Term £ and f £ E has arity n. 

A finite tree automaton (FTA) is a means of finitely specifying a possibly 
infinite set of terms. An FTA is defined as a quadruple (Q, Qf, E, A), where Q 
is a finite set of states, Qf C Q is the set of accepting (or final) states, A is a 
set of ranked function symbols and A is a set of transitions. Each element of A 
is of the form f{q\, ■ . ■ , q n ) — > q, where f £ E and q,qi, ■ ■ ■ ,q n £ Q. 

FTAs can be “run” on terms in Term^; a successful run of a term and an FTA 
is one in which the term is accepted by the FTA. When a term is accepted, it is 
accepted by one or more of the final states of the FTA. Different runs may result 
in acceptance by different states. At each step of a successful bottom-up run, 
some subterm identical to the left-hand-side of some transition is replaced by the 
right-hand-side, until eventually the whole term is reduced to some accepting 
state. The details can be found elsewhere [4]. Implicitly, a tree automaton R 
defines a set of terms (or tree language), denoted L(R), which is the set of all 
terms that it accepts. 

Tree Automata and Types. An accepting state of an FTA can be regarded 
as a type. Given an automaton R = ( Q,Q f, E, A ), and q £ Qf, define the 
automaton R q to be {Q,{q},E,A). The language L{R q ) is the set of terms 
corresponding to type q. A term t is of type q if and only if t £ L(R q ). 

A transition f(q±, . . ■ , q n ) — * q, when regarded as a type rule, is usually 
written the other way around, as q — » f(qi, ■ ■ ■ , q n )- Furthermore, all the rules 
defining the same type, q — * R\, . . . , q — > R n are collected into a single equation 
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of the form q = i?i; . . . ; R n . When speaking about types we will usually follow the 
type notation, but when discussing FTAs we will use the notation for transitions, 
in order to make it easier to relate to the literature. 

Example 2. Let Q = { listnat,nat }, Qf = { listnat }, £ = {[], [_|_], s , 0 °}, A = 
{[] — > listnat , [nat\listnat] — » listnat , 0 — » nat, s (nat) — » nat}. The type listnat 
is the set of lists of natural numbers in successor notation; the type rule notation 
is listnat = []; [nat\listnat], and nat = 0 ; s(nat). 

Let £ = {[], [_|_], s , 0 °}, Q = {zero, one, listo, listi}, Qf = {listi}, and 
A = {[] — > list i, [one\listi\ — » listi, [zero\listo\ — > listi, [] — ► listo, [. zero\listo ] — »■ 

listo ,0 — > zero, s(zero) —> one}, or listi = []; [one\listi\- [zero\listo\ and listo = 
[]; [zero\listo\. The type listi is the set of lists consisting of zero or more elements 
s( 0 ) followed by zero or more elements 0 (such as [s( 0 ), 0 ], [s( 0 ), s( 0 ), 0 , 0 , 0 ], 
[ 0 , 0 ], [s( 0 )] ; . . .). This kind of set is not normally thought of as a type. 



Deterministic and Non-deterministic Tree Automata. It can be shown 
that (so far as expressiveness is concerned) we can limit our attention to FTAs 
in which the set of transitions A contains no two transitions with the same 
left-hand-side. These are called bottom-up deterministic finite tree automata. 
For every FTA R there exists a bottom-up deterministic FTA R' such that 
L(R) = L(R'). The sets of terms accepted by states of bottom-up deterministic 
FTAs are disjoint. Each term in L(R’) is accepted by exactly one state. 

An automaton R = ( Q , Q f, £ , A) is called complete if for all n-ary functions 
f £ £ and states qi, ... , q n £ Q, it contains a transition f(q ±, . . . , q n ) — > q. We 
may always extend an FTA (Q, Q f, £, A) to make it complete, by adding a new 
state to Q. Then add transitions of the form f(qi, . . . , q n ) — » for every 

combination of / and states qi, ... , q n (including q b ) that does not appear in A. 
A complete bottom-up deterministic finite tree automaton in which every state 
is an accepting state partitions the set of terms into disjoint subsets (types), one 
for each state. In such an automaton q 1, can be thought of as the error type, that 
is, the set of terms not accepted by any other type. 

Example 3 . Let £ = {[], [_|_], 0 °}, and let Q = {list, listlist, any}. We define 

n times 

the set A any , for a given £, to be the set of transitions {/(any , .?. , any ) — * 
any \f n £ £}. Let Qf = {list, listlist}, A = {[] — » list, [any \list] —> list,[ ] — > 
listlist, [list\listlist] — » listlist} U A any . The type list is the set of lists of any 
terms, while the type listlist is the set of lists whose elements are of type list', 
note that list includes listlist. 

The automaton is not bottom-up deterministic; for example, three transitions 
have the same left-hand-side, namely, [] — > list, [] — > listlist and [] — » any. So for 
example the term [[ 0 ]] is accepted by list, listlist and any. A determinization 
algorithm can be applied, yielding the following, qi corresponds to the type anyC I 
listn listlist, q 2 to the type (listn any) — listlist, and (73 to any— (listU listlist). 
Thus <71,92 and <73 are disjoint. The automaton is given by Q = {91, <72, <73}, £ 
as before, Q f = {91,92} and A = {[] -> 91, [91 |9i] ->■ 9i, [52(91] ->■ 9i, [91 1 92] ->■ 
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<72, feM -> <72, feM ->■ </2, [</3 |<7l] -»■ 52 , [<72 1 <73 ] -»■ 53, [5l|©] -»■ <73, [<73|<73] ->■ 
53 , 0 — > 53 }. This automaton is also complete; the determinization of this example 
will be discussed in more detail in Section 3. 

An FTA is top-down deterministic if it has no two transitions with both the same 
right-hand-side and the same function symbol on the left-hand-side. Top-down 
determinism introduces a loss in expressiveness. It is not the case that for each 
FTA R there is a top-down deterministic FTA R' such that L(R) = L(R'). Note 
that a top-down deterministic automaton can be transformed to an equivalent 
bottom-up deterministic automaton, as usual, but the result might not be top- 
down deterministic. 

Example 4- Take the second automaton from Example 2. This is not top-down 
deterministic, due to the presence of transitions [one\listi\ — > listi, [ zero\listo ] — * 
listi. No top-down deterministic automaton can be defined that has the same 
language. Thus the set accepted by listi could not be defined as a type, using 
type notations that require top-down deterministic rules (e.g. [2,3]). 

Example 5. We define the set A any as before. Consider the automaton with tran- 
sitions A any U {[] — » list , [any\list\ — > list}. This is top-down deterministic, but 
not bottom-up deterministic (since [] — * list and [] — > any both occur). Deter- 
minizing this automaton would result in one that is not top-down deterministic. 

2.1 Analysis Based on Pre-interpretations 

We now define the analysis framework for logic programs. Bottom-up declarative 
semantics captures the set of logical consequences (or a model) of a program. 
The standard, or concrete semantics is based on the Herbrand pre-interpretation. 
The theoretical basis of this approach to static analysis of definite logic programs 
was set out in [5,6] and [7]. We follow standard notation for logic programs [ 8 ]. 

Let P be a definite program and £ the signature of its underlying language 
L. A pre-interpretation of L consists of 

1 . a non-empty domain of interpretation D ; 

2. an assignment of an n-ary function D n — > D to each n-ary function symbol 
in £ (n > 0 ). 



Correspondence of FTAs and Pre-interpretations. A pre-interpretation 
with a finite domain D over a signature £ is equivalent to a complete bottom-up 
deterministic FTA over the same signature, as follows. 

1. The domain D is the set of states of the FTA. 

2. Let / be the function D n — > D assigned to f £ £ by the pre-interpretation. 
In the corresponding FTA there is a set of transitions f(di , . . . , d n ) — > d, for 
each d\, ... ,d n ,d such that f(di , . . . , d n ) = d. Conversely the transitions of 
a complete bottom-up deterministic FTA define a function [4] . 
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Semantics Parameterized by a Pre-interpretation. We quote some defi- 
nitions from Chapter 1 of [8]. Let J be a pre-interpretation of L with domain 
D. Let V be a mapping assigning each variable in L to an element of D. A term 
assignment Tj (t) is defined for each term t as follows: 

1. Tj (x) = V(x) for each variable x. 

2. Tj . . . , t n )) = f'{Tj (t i),... ,Tj (t n )), (n > 0) for each non-variable 
term 

f(ti, . . . ,t n ), where /' is the function assigned by J to /. 

Let J be a pre-interpretation of a language L, with domain D , and let p be an n- 
ary function symbol from L. Then a domain atom for J is any atom p[d \, . . . , d n ) 
where di € D, 1 < i < n. Let p(ti , . . . , t n ) be an atom. Then a domain instance 
ofp(ti, ... ,t n ) with respect to J and V is a domain atom p(Tj (ti), . . . ,Tj ( t n )). 
Denote by [A] j the set of all domain instances of A with respect to J and some 
V. 

The definition of domain instance extends naturally to formulas. In partic- 
ular, let C be a clause. Denote by [ C]j the set of all domain instances of the 
clause with respect to J. 

Core Bottom-Up Semantics Function Tp. The core bottom-up declarative 
semantics is parameterisecl by a pre-interpretation of the language of the pro- 
gram. Let P be a definite program, and J a pre-interpretation of the language 
of P. Let Atomj be the set of domain atoms with respect to J. The function 
Tp : 2 Atomj — > 2 Atomj is defined as follows. 
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M J [P] = Ifp(Xp): M ,7 [P] is the minimal model of P with pre-interpretation J. 

Concrete Semantics. The usual semantics is obtained by taking J to be the 
Herbrancl pre-interpretation, which we call H . Thus Atomn is the Herbrand 
base of (the language of) P and M 11 [P] is the minimal Herbrand model of P. 

The minimal Herbrand model consists of ground atoms. In order to cap- 
ture information about the occurrence of variables, we extend the signature 
with an infinite set of extra constants V = {i;o> v%, . . .}. The Herbrand pre- 

interpretation over the extended language is called HV . The model [P] is 
our concrete semantics. 

The elements of V do not occur in the program or goals, but can appear 
in atoms in the minimal model M 77 ' [P], Let C(P) be the set of all atomic 
logical consequences of the program P, known as the Clark semantics [9]; that 
is, C = {A | P |= VA}, where A is an atom. Then M ffv [P] is isomorphic to C(P). 
More precisely, let 17 be some fixed bijective mapping from V to the variables 
in L. Let A be an atom; denote by 17(H) the result of replacing any constant 
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Vj in A by fi(yj). Then A £ M ffv [P] iff P |= V(L?(A)). By taking the Clark 
semantics as our concrete semantics, we can construct abstractions capturing 
the occurrence of variables. This version of the concrete semantics is essentially 
the same as the one discussed in [7]. 

In our applications, we will always use pre-interpretations that map all ele- 
ments of V onto the same domain element, say d v . In effect, we do not distin- 
guish between different variables. Thus, a pre-interpretation includes an infinite 
mapping {co | — > d v ,v\ i— > d v , . . .}. For such interpretations, we can take a sim- 
pler concrete semantics, in which the set of extra constants V contains just one 
constant v instead of an infinite set of constants. Then pre-interpretations are 
defined which include a single mapping {v i— > d v } to interpret the extra constant. 
Examples are shown in Section 4. 

Abstract Interpretations. Let P be a program and J be a pre-interpretation. 
Let Atomj be the set of domain atoms with respect to J. The concretisation 
function 7 : 2 Atomj -► 2 AtomHV is defined as y(S) = { A | [A]j C S } 

M J [P] is an abstraction of the atomic logical consequences of P, in the 
following sense. 

Proposition 1 . Let P be a program with signature £, and V be a set of con- 
stants not in £ (where V can be either infinite or finite). Let HV be the Her- 
brand interpretation over £ U V and J be any pre-interpretation of £ U V. Then 
M HV \P\ C 7 (M j [P]). 

Thus, by defining pre-interpretations and computing the corresponding least 
model, we obtain safe approximations of the concrete semantics. 

Condensing Domains. The property of being a condensing domain [10] has 
to do with precision of goal-dependent and goal-independent analyses (top-down 
and bottom-up) over that domain. Goal-independent analysis over a condens- 
ing domain loses no precision compared with goal-dependent analysis; this has 
advantages since a single goal-independent analysis can be reused to analyse 
different goals (relatively efficiently) with the same precision as if the individual 
goals were analysed. 

The abstract domain is 2 Atomj , namely, sets of abstract atoms with respect 
to the domain of the pre-interpretation J, with set union as the upper bound 
operator. The conditions satisfied by a condensing domain are usually stated in 
terms of the abstract unification operation (namely that it should be idempotent 
and commutative) and the upper bound U on the domain (which should satisfy 
the property j(X U Y) = 7(A) U 7(E)). The latter condition is clearly satisfied 
(U = U) in our domain). Abstract unification is not explicitly present in our 
framework. However, we argue informally that the declarative equivalent is the 
abstraction of the equality predicate X = Y. This is the set {d = d \ d £ Dj} 
where Dj is the domain of the pre-interpretation. This satisfies an idempotency 
property, since for example the clause p(X, Y) <— X = Y, X = Y gives the same 
result as p(X,Y) <— X = Y. It also satisfies a relevant commutativity property, 
namely that the solution to the goal q(X, Y), X = Y is the same as the solution 
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to q(X,Y), where each clause q(X,Y) <— B is replaced by q(X,Y) <— X = 
Y, B. These are informal arguments, but we also note that the goal-independent 
analysis yields the least, that is, the most precise, model for the given pre- 
interpretation, which provides support for our claim that domains based on 
pre-interpretations are condensing. 



3 Deriving a Pre-interpretation from Regular Types 

As mentioned above, a pre-interpretation of a language signature E is equivalent 
to a complete bottom-up deterministic FTA over E. An arbitrary FTA can be 
transformed to an equivalent complete, bottom-up deterministic FTA. Hence, 
we can construct a pre-interpretation starting from an arbitrary FTA. 

An algorithm for transforming a non-deterministic FTA (NFTA) to a deter- 
ministic FTA (DFTA) is presented in [4]. The algorithm is shown in a slightly 
modified version. 

input: NFTA R = ( Q,Qf , E, A), 

Set Qd to 0; set A d to 0 
repeat 

Set Q d to Q d U {s}, A d to A d U {/(si, . . . , s n ) -> s} 

where 

Vf n G E, Vs i, . . . , s n G Q d , C = si x ... x s n 
s = {qe Q\3(q 1 , . . . , q n ) G C, f(q lt . . . ,q n ) -» q G A} 
until no rule can be added to A d 
Set Q df to {s G Q d | s n Q df ± $} 
output: DFTA R d = {Qd, Qd f ,E, A d ) 

Description: The algorithm transform the NFTA from one that operates on 
states, to one that operates on sets of states from the NFTA. In the DFTA, the 
output of the algorithm, all reachable states in the NFTA are contained in sets 
that make up the new states - these are contained in the set Qd- A state in the 
NFTA can occur in more than state in the DFTA. Potentially every non-empty 
subset of the set of states of the NFTA can be a state of the DFTA. 

The sets in Qd and the new set of transitions, A d , are generated in an iterative 
process. In an iteration of the process, a function / is chosen from E. Then a 
number of sets, s\,...,s n corresponding to the arity of /, is selected from Q d 
- the same set can be chosen more than once. The cartesian product is then 
formed, (si x • • • x s„), and for each element in the cartesian product, q \, . . . , q n , 
such that a transition f{q \, . . . , q n ) — » q exists, q is added to a set s. When all 
elements in the cartesian product have been selected, the set s is added to Q d if 
s is non-empty and not already in Q d . A transition /(s i, . . . , s n ) — » s is added 
to A d if s is non-empty. 

The algorithm terminates when Q d is such that no new transitions are added. 
Initially Qd is the empty set, so no set containing a state can be chosen from Q d 
and therefore only the constants (0-ary functions) can be selected on the first 
iteration. 
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Example 6. In Example 3 a non-deterministic FTA is shown; E = {[],[_ | 
_] , 0 °}, Q = {list, listlist, any}, A = A an y U {[] — * list, [any \ list] — > list, [] — » 
listlist, [list | listlist] — > listlist}. 

A step by step application of the algorithm follows: 

Step 1 : Qd = 0 , Ad = 0 . Choose f as a constant, / = []. Now s = {<7 £ 
Q | [] — > q G A} = {any, list, listlist}. Add s to Qd and the transition [] — ■> 
{any , list, listlist} to Ad- 

Step 2 : Choose f — 0 . Now s = {(? G Q | 0 — > q £ A} = {any}. Add s to Qd 
and the transition 0 — > {any} to Ad- 

Step 3 : Choose / = [_ | _], Si = S2 = {any, list, listlist}. Now s = {q G Q \ 
3 qi G Si, 3^2 G s 2, [qi | (72] — * ► q G A} = {any, list, listlist}. Add s to Qd and the 
transition [{any, list, listlist} \ {any, list, listlist}] — > {any , list, listlist} to Ad- 

Step 4 : Choose / = [_ | _], Si = S2 = {any}. Now s = {q € Q \ 3qi G 
Si, 3(72 G S2, [<71 I (72] — » q G A} = {any}. Add s to Qd and the transition 
[{any} \ {any}] -> {any} to A d . 

Step 5 : Choose / = [_ | _], si = {any},S2 = {any , list, listlist} . Now s = {q G 
Q | 3<7i G Si, 3g2 G S2, [q\ \ (72] — »■ q G A} = {any, list}. Add s to Q d and the 
transition [{any} \ {any, list, listlist}] — > {any, list} to Ad- 

Step 6: Choose / = [_ | _], Si = {any, list, listlist}, s^ = {any}. Now s = {q G 
Q\3qi G Si, 3g2 G S2, [<71 | (72] — <7 G A} = {any}. Add s to Q d and the transition 
[{any , list, listlist}\{any}] — > {any} to Ad- 

Step 7 to 11: No new sets added to Q d . New transitions added to A d : [{any, 
list} | {any, list}] — > {any, list}, [{any, list} \ {any , list, listlist}] — » {any, list, 
listlist }, [{any, list, listlist} \ {any, list}] — > {any, list}, [{any} \ {any, list}] 
{any, list}, [{any, list} \ {any}] — + {any}. 

The states of states Q d and the transitions A d in the resulting DFTA are 
equivalent to the states and transitions in Example 3 . qi = {any, list, listlist}, 
q2 = {any, list} and finally (73 = {any}. 

In a naive implementation of the algorithm where every combination of argu- 
ments to the chosen / would have to be tested in each iteration, the complexity 
lies in forming and testing each element in the cartesian product, for every com- 
bination of states in Q d . It is possible to estimate of the number of operations 
required in a single iteration of the process, where an operation is the steps 
necessary to determine whether f(qi, . . . ,q n ) —* q G A. Since A is static, an 
operation on E can be considered to be of constant time. The number of opera- 
tions can be estimated by the formula = (s * e)“, where s is the number of 
states in Qd, e is the average number of elements in a single state in Q d and a is 
the arity of the chosen /. Every time a state is added to Q d , an iteration in the 
algorithm will require additional operations. The worst case is if the algorithm 
causes an exponential blow-up in the number of states [ 4 ]. 

Obtaining a Complete FTA: The determinization procedure does not return a 
complete FTA in general. We can complete it as outlined in Section 2 , by adding 
an extra state and corresponding transitions. Another way is to ensure that the 
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input NFTA accepts every term. We can easily do this by adding the standard 
transitions A any to the input NFTA. The output DFTA is then guaranteed to 
be complete. 

The algorithm’s efficiency can be improved by generating the new states, 
Qd, before the new transitions, Ad, are generated. Each iteration in the naive 
algorithm will redo work from previous iterations, though only combinations 
containing a newly added state can result in new states. The transitions can be 
generated in one iteration if all states in Qd are known. 

The new states are formed based on the transitions in the NFTA. The NFTA 
does not change during the algorithm and a preprocessing of the NFTA can be 
used to determine, for a given /", which states from Qd can possibly occur as 
arguments in transitions: those states in Qd not containing a state from the 
NFTA that occurring as and argument of / cannot result in any new state being 
added to Qd- 

Experimental results using an optimised version of the above algorithm (to 
be described in detail in a forthcoming paper) show that the algorithm can 
handle automata with hundreds of transitions. Table 3 in Section 5 gives some 
experimental results. 

4 Examples 

In this section we look at examples involving both types and modes. The use- 
fulness of this approach in a binding time analysis (BTA) for offline partial 
evaluation will be shown. We also illustrate the applicability of the domains to 
model-checking . 

We assume that £ includes one special constant v (see Section 2.1). The 
standard type any is assumed where necessary (see Example 3) , and it includes 
the rule v — > any. 

Definition of Modes as Regular Types. Instantiation modes can be coded 
as regular types. In other words, we claim that modes are regular types, and 
that this gives some new insight into the relation between modes and types. The 
set of ground terms over a given signature, for example, can be described using 
regular types, as can the set of non-ground terms, the set of variables, and the set 
of non-variable terms. The definition of the types ground ( g ) and variable ( var ) 
are g = 0; [; [g\g]', s(g) and var = v respectively. Using the determinization 
algorithm, we can derive other modes automatically. For these examples we 
assume the signature £ = {[], [_|_], s, 0} with the usual arities, though clearly 



Input states 


Output states 


Corresponding inodes 


g, var, any 
g, any 
var, any 


{any,g}, {any, var}, {any} 
{any,g}, {any} 

{any, var}, {any} 


ground, variable, non-ground-non-variable 
ground, non-ground 
variable, non-variable 



Fig. 1. Mode pre-interpretations obtained from g, var and any 
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the definitions can be constructed for any signature. Different pre-interpretations 
are obtained by taking one or both of the modes g and var along with the type 
any , and then determinizing. The choices are summarised in Figure 1. We do 
not show the transitions, due to lack of space. To give one example, the mode 
non-variable in the determinized FTA computed from var and any is given by 
the transitions for {any}. 

{any} = 0; []; \{any}\{any})\ [{any,var}\{any}}\ [{any}] {any, var}]] 

[{any, var}\{any, var}]] s({any}); s({any, var}) 

Let P be the naive reverse program shown below. 

reu([],Q). rev{[X]U],W) <— rev(U, V), app(V, [X], W). 
app{[],Y, Y). app([X\U], V, [X\W]) <- app(U, V, W). 



Input types 


Model 


g,v,any 


{rev(g, g),rev(ngnv, ngnv), app(g, var, ngnv),app(g, var, var), 
app(g, g, g),app(g, ngnv, ngnv) , app(ngnv , X, ngnv)} 


9, any 


{rev(g, g),rev(ng, ng), app{g, X, X),app{ng, X, ng)} 


var , any 


{rev(nv, nv), app(nv, X, X), app(nv, X, nv)} 



Fig. 2. Abstract Models of Naive Reverse program 



The result of computing the least model of P is summarised in Figure 2, with 
the abbreviations ground=g, variable=i>, non-ground = rig , non-variable=ni> and 
non-ground-non-variable=n< 7 ?ru. An atom containing a variable X in the ab- 
stract model is an abbreviation for the collection of atoms obtained by replacing 
X by any element of the abstract domain. The analysis based on g and any is 
equivalent to the well-known Pos abstract domain [10], while that based on g, 
var and any is the fgi domain discussed in [7]. The presence of var in an ar- 
gument indicates possible freeness, or alternatively, the absence of var indicates 
definite non-freeness. For example, the answers for rev are definitely not free, 
the first argument of app is not free, and if the second argument of app is not 
free then neither is the third. 

Combining Modes with Other Types. Consider the usual definition of lists, 
namely list = []; [any\list\. Now compute the pre-interpretation derived from 
the types list, any and g. Note that list, any and g intersect. The set of disjoint 
types is {{any, ground}, {any, list}, {any, ground, list}, {any}} (abbreviated as 
{g,ngl, gl,ngnl} corresponding to ground non-lists, non-ground lists, ground 
lists, and non-ground-non-lists respectively). The abstract model with respect 
to the pre-interpretation is 

{rev{gl,gl),rev{ngl, ngl ), 

app(gl, X, X), app(ngl, ngnl, ngnl) , app(ngl , gl, ngl) , app(ngl , ngl, ngl)} 
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Types for Binding Time Analysis. Binding time analysis (BTA) for offline 
partial evaluation in Logen [11] distinguishes between various kinds of term 
instantiations. Static corresponds to ground, and dynamic to any. In addition 
Logen has the binding type nonvar and user-defined types. 

A given set of user types can be determinized together with types represent- 
ing static, dynamic (that is, g and any ) and var. Call types can be computed 
from the abstract model over the resulting pre-interpretation, for example us- 
ing a query-answer transformation (magic sets). This is a standard approach to 
deriving call patterns; [12] gives a clear account and implementation strategy. 

Let P be the following program for transposing a matrix. 



transpose(Xs, []) <— 
nullrows(Xs). 
transpose(Xs, [T|Fs]) *— 
makerow(Xs, Y, Zs), 
transpose(Z s,Ys). 



makerow {[ ], [],[]). 

makerow([[X\Xs}\Ys], [A|Xsl], [Xs\Zs]) 
makerow(Y s , Xs 1, Zs). 
nullrows([]). 

nullrows([[]\Ns\) 4— nullrows(Ns). 



Let row and matrix be defined as row = []; [any\row\ and matrix = 
[]; [row\matrix] respectively. These are combined with the standard types g, var 
and any. Given an initial call of the form transpose{matrix,any), BTA with 
respect to the disjoint types results in the information that every call to the 
predicates maker ow and transpose has a matrix as first argument. More specif- 
ically, it is derived to have a type {any , matrix, row, g} or {any, matrix, row}, 
meaning that it is either a ground or non-ground matrix. Note that any term of 
type matrix is also of type row. This BTA is optimal for this set of types. 



Infinite- State Model Checking. The following example is from [13]. 



gen{[ 0,1]). 

gen([0\X]) 4 - gen(X). 
reachable(X) <— 
gen{X). 

reachable(X) <— 
reachable(Y) ,trans{Y, X). 



transl([0,l|T],[l,0|T]). 
transl([H\T], [H\T1]) <- 
transl(T, Tl). 
trans2([0], [1]). 
trans2([H\T], [H\T1]) <- 
trans2(X, Y). 



trans(X, Y) 4— 
transl(X, Y). 
trans([l\X], [0|F]) 
trans2(T, Tl). 



It is a simple model of a token ring transition system. A state of the system is a 
list of processes indicated by 0 and 1 where a 0 indicates a waiting process and 
a 1 indicates an active process. The initial state is defined by the predicate gen 
and the the predicate reachable defines the reachable states with respect to the 
transition predicate trans. The required property is that exactly one process is 
active in any state. The state space is infinite, since the number of processes (the 
length of the lists) is unbounded. Hence finite model checking techniques do not 
suffice. The example was used in [14] to illustrate directional type inference for 
infinite-state model checking. 

We define simple regular types defining the states. The set of “good” states 
in which there is exactly one 1 is goodlist. The type zerolist is the set of list 
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of zeros. (Note that it is not necessary to give an explicit definition of a “bad” 
state) . 

one = 1 goodlist = \zero\goodlist\-, [one\zerolist\ 
zero = 0 zerolist = [ ; [zero\zerolist] 

Determinization of the given types along with any results in five states repre- 
senting disjoint types: {any, one}, {any, zero}, the good lists {any , goodlist}, 
the lists of zeros {any, zerolist} and all other terms {any}. We abbreviate these 
as one, zero, goodlist, zerolist and other respectively. The least model of the 
above program over this domain is as follows. 

gen(goodlist) transl(goodlist, goodlist) , transl(other, other ) 

trans2(other, other) trans(goodlist, goodlist) ,trans{other, other) 

trans2(goodlist, other) reachable(goodlist) 

trans2(goodlist, goodlist) 

The key property of the model is the presence of reachable(goodlist) (and the 
absence of other atoms for reachable), indicating that if a state is reachable 
then it is a goodlist. Note that the transitions will handle other states, but in 
the context in which they are invoked, only goodlist states are propagated. In 
contrast to the use of set constraints or directional type inference to solve this 
problem, no goal-directed analysis is necessary. Thus there is no need to define 
an “unsafe” state and show that it is unreachable. 

In summary, the examples show that accurate mode analysis can be per- 
formed, and that modes can be combined with arbitrary user defined types. 
Types can be used to prove properties expressible by regular types. Note that 
no assumption needs to be made that programs are well-typed; the programmer 
does not have to associate types with particular argument positions. 

5 Implementation and Complexity Issues 

The implementation is based on two components; the FTA determinization algo- 
rithm described in Section 3, which yields a pre-interpretation, and the compu- 
tation of the least model of the program with respect to that pre-interpretation. 

We have designed a much faster version of the determinization algorithm pre- 
sented in Section 3. Clearly the worst-case number of states in the determinized 
FTA is exponential, but our algorithm exploits the structure of the given FTA 
to reduce the computation. Nevertheless the scalability of the determinization 
algorithm is a critical topic for future study and experiment. A forthcoming pub- 
lication will describe our algorithm and its performance for typical FTAs. We 
note that, although the states in the determinized FTA are formed from subsets 
of the powerset of the set of states in the input FTA, most of the subsets are 
empty in the examples we have examined. This is because there are many cases 
of subtypes and disjoint types among the given types. 

The number of transitions in the determinized FTA can increase rapidly, 
even when the number of states does not, due to the fact that the output is a 
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complete FTA. Hence, for each n-ary function, there are m n transitions, if there 
are m states in the determinized automaton. We can alleviate the complexity 
greatly by making use of “don’t care” arguments of functions in the transitions, 
of which there are usually several, especially in the transitions for the {any} 
state, which represents terms that are not of any other type. If there exists an 
n-ary function / and states qi , . . . , qj-i, qj+i, ■ ■ ■ , q n , q such that for all states 
qj, there is a transition f(q ±, . . . , qj , . . . , q n ) — > q, then we can represent all such 
transitions by the single transition f(qi , . . . , qj- i,X, qj+i , , . . . , q n ) — > q. The j th 
argument is called a don’t care argument. Our algorithm generates the transitions 
of the determinized FTA with some “don’t care” arguments (though not all the 
possible don’t cares are generated in the current version), which is critical for 
the scalability of the model computation. 

Abstract Compilation of a Pre-interpretation. The idea of abstract com- 
pilation was introduced first by Debray and Warren [15]. Operations on the 
abstract domain are coded as logic programs and added directly to the target 
program, which is then executed according to standard concrete semantics. The 
reason for this technique is to avoid some of the overhead of interpreting the 
abstract operations. 

A pre-interpretation can be defined by a predicate — » /2 defining the FTA 
transitions. We introduce the predicate — » /2 directly into the program to be 
analysed, as follows. Each clause of the program of the form is transformed 
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by repeatedly replacing non- variable terms occurring in the clause, of form 
f(xi, . . . ,x m ) where x\,,..,x m (jn > 0) are variables, by a fresh variable u 
and adding the atom f(x i, . . . , x rn ) — * u to the clause body, until the only non- 
variables in the clause occur in the first argument of — >. If P is the original 
program, the transformed program is called P. 

When a specific pre-interpretation J is added to P, the result is a domain 
program for J, called P J . Clearly P J has a different language than P, since the 
definition of — » /2 contains elements of the domain of interpretation. It can easily 
be shown that least model M J [P] = Ifp(Xp) is obtained by computing Ifp (Tpj), 
and then restricting to the predicates in P (that is, omitting the predicate — > /2 
which was introduced in the abstract compilation). An example of the domain 
program for append and the pre-interpretation for variable/non- variable is shown 
below. (Note that don’t care arguments are used in the definition of — > /2). 

app(U, Y, Y) <- [] - U. app(U, Y, V) <- app(X , Y, Z), [X\X] - U, [X\ Z] -> V. 

v — > var. [] — + nonvar. [_|_] — > nonvar. 



Computation of the Least Domain Model. The computation of the least 
model is an iterative fixpoint algorithm. The iterations of the basic fixpoint 
algorithm, which terminates when a fixed point is found, can be decomposed 
into a sequence of smaller fixpoint computations, one for each strongly connected 
component (SCC) of the program’s predicate dependency graph. These can be 
computed in linear time [16]. In addition to the SCC optimisation, our implemen- 
tation incorporates a variant of the semi-naive optimisation [17], which makes 
use of the information about new results on each iteration. A clause body con- 
taining predicates whose models have not changed on some iteration need not 
be processed on the next iteration. 

Experimental Results. Figure 3 shows a few experimental results (space does 
not permit more) . For each program, the table shows the number of clauses and 
the number of function symbols. The time to perform the determinization and 
compute the least model is shown. Timings were obtained using Ciao Prolog 
running on a machine with 4 Intel Xeon 2 GHz processors and 1 GByte of mem- 
ory. The determinization algorithm currently does not find all the “don’t care” 
arguments. Insertion of don’t care values by hand indicates that the method 
scales better when this is done. More generally, finding efficient representations 
of sets of domain atoms is a critical factor in scalability. For two-element pre- 
interpretations such as Pos, BDDs [18] or multi-headed clauses [19] can be used. 

6 Related Work and Conclusions 

Prior work on propagating type information in logic programs goes back to 
[20] and [21]. Our work can be seen partly as extending and generalising the 
approach of Codislr and Demoen [22] . Analysis of logic programs based on types 
was performed by Codislr and Lagoon [23]. Their approach was similar in that 




Abstract Domains Based on Regular Types 



41 



given types were used to construct an abstract domain. However their types were 
quite restricted; each function symbol had to be of exactly one type (which is 
even more restrictive than top-down deterministic FTAs). Hence several of the 
application discussed in this paper are not possible, such as modes, or types such 
as the goodlist type of Example 4. On the other hand, their approach used a 
more complex abstract domain, using ACI unification to implement the domain 
operations, which allowed polymorphic dependencies to be derived. Like our 
approach, their domain was condensing. 

Work on regular type inference is complementary to our method. The types 
used as input in this paper could be derived by a regular type inference, or 
set constraints. One possible use for the method of this paper would be to en- 
hance the precision given by regular type inference. For example, (bottom-up) 
regular type inference derives the information that the first argument of rev/ 2 
in the naive reverse program is a list; using a pre-interpretation derived from 
the inferred type, it can then be shown that the second argument is also a list. 
This approach could be used to add precision to regular type inference and 
set constraint analysis, which are already promising techniques in infinite state 
model-checking [14] . 

Applications in binding time analysis for offline partial evaluation have been 
investigated, with promising results. As noted in Section 4 various mode analyses 
can be reproduced with this approach, including Pos analysis [24]. 
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Abstract. We study termination of logic programs with dynamic sched- 
uling, as it can be realised using delay declarations. Following previous 
work, our minimum assumption is that derivations are input-consuming, 
a notion introduced to define dynamic scheduling in an abstract way. 
Since this minimum assumption is sometimes insufficient to ensure ter- 
mination, we consider here various additional assumptions on the permis- 
sible derivations. In one dimension, we consider derivations parametrised 
by any property that the selected atoms must have, e.g. being ground 
in the input positions. In another dimension, we consider both local and 
non-local derivations. In all cases, we give sufficient criteria for termina- 
tion. The dimensions can be combined, yielding the most comprehensive 
approach so far to termination of logic programs with dynamic schedul- 
ing. For non-local derivations, the termination criterion is even necessary. 



1 Introduction 

Termination of logic programs has been widely studied for the LD selection rule, 
i.e., derivations where the leftmost atom in a query is selected [1,4,8-11,16]. 
This rule is adequate for many applications, but there are situations, e.g., in the 
context of parallel executions or the test-and-generate paradigm, that require 
dynamic scheduling, i.e., some mechanism to determine at runtime which atom 
is selected. Dynamic scheduling can be realised by delay declarations [12,23], 
specifying that an atom must be instantiated to a certain degree to be selected. 

Termination of logic programs with dynamic scheduling has been studied for 
about a decade [3,6,7,13-15,17,19,22], starting with observations of surpris- 
ingly complex (non-)termination behaviour of simple programs such as APPEND 
or PERMUTE with delay declarations [17]. In our own research [7, 19, 22], we found 
that modes (input and output), while arguably compromising the “pure logical” 
view of logic programming, are the key to understanding this behaviour and 
achieving or verifying termination. We have proposed input- consuming deriva- 
tions (where in each resolution step, the input arguments of the selected atom 
do not become instantiated) as a reasonable minimum assumption about the 
selection rule, abstracting away from the technicalities of delay declarations. 

In this paper, we study termination of logic programs for input-consuming 
derivations with various additional assumptions about the selection rule, e.g. say- 
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ing that the selected atom must be ground in its input positions. Some au- 
thors have considered termination under such strong assumptions [13-15, 17, 22], 
partly because certain programs do not terminate for input-consuming deriva- 
tions, but also because termination for input-consuming derivations is so hard 
to show: termination proofs usually use level mappings, which measure the size 
of an atom; an atom is bounded if its level is invariant under instantiation. Now, 
the usual reasoning that there is a decrease in level mapping between a clause 
head and the clause body atoms does not readily apply to derivations where 
selected atoms are not necessarily bounded. 

After intensive studies of the semantics [6], and restricting to a class of pro- 
grams that is well-behaved wrt. the modes, we now have a sufficient and neces- 
sary criterion for termination of input-consuming derivations [7]. The key con- 
cept of that approach is a special notion of model, bottom-up computable by a 
variant of the well-known Tp-operator. The notion reflects the answer substitu- 
tions computed by input-consuming derivations. We build on this work here. 

We consider additional assumptions in two dimensions. Certain additional 
assumptions about derivations, e.g. the one above, can be formulated in terms of 
the selected atoms alone. We do this abstractly by saying that each selected atom 
must have a property V . There are some natural conditions on V mentioned later. 
It turns out that the approach of [7] can be easily adapted to give a sufficient 
criterion for termination of ^-derivations [20]. The semantic notions (model) of 
[7] could be used without change. In this paper we give a criterion that is also 
necessary. To this end, the approach of [7] required some small modifications in 
many places. More specifically, the model notion had to be modified. 

Other additional assumptions about derivations cannot be expressed in terms 
of the selected atoms alone. We consider here one such assumption, that of 
derivations being local , meaning that in a resolution step, the most recently 
introduced atoms must be resolved first [14]. This is not a property of a single 
atom, but of atoms in the context of a derivation. To deal with local selection 
rules, we modify the model notion of [7] so that it reflects the substitutions 
computed by local derivations. Based on such models, we can give a sufficient 
criterion for termination of local derivations, parametrised by a V as before. 

We thus present a framework for showing termination of logic programs with 
dynamic scheduling. The initial motivation for this work was our impression 
that while stronger assumptions than that of input-consuming derivations are 
sometimes required, locality is too strong. More specifically, by instantiating the 
framework appropriately, we can now make the following five points: 

1. There is a class of recursive clauses, using a natural pattern of program- 
ming, that narrowly misses the property of termination for input-consuming 
derivations. Put simply, theses clauses have the form p(X) <— q(X, Y), p{ Y), 
where the mode is p(input), q(input , output) . Due to the variable in the 
head, it follows that an atom using p may always be selected, and hence we 
have non-termination. Sometimes, just requiring the argument of p to be at 
least non-variable is enough to ensure termination. This can be captured by 
setting (the relevant subset of) V to {p(t) \ t is non- variable} (Ex. 17). 
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2. Some programs require for termination that selected atoms must be bounded 
wrt. a level mapping |.|. This is related to speculative output bindings, and 
the PERMUTE program is the standard example [17]. This can be captured in 
our approach by setting V to the set of bounded atoms (Ex. 14) . 

3. For some programs it is useful to consider “hybrid” selection rules, where 
differently strong assumptions are made for different predicates. For exam- 
ple, one might require ground input positions for some predicates but no 
additional assumptions for other predicates. This can be captured by setting 
V accordingly (Ex. 18). 

4. A method for showing termination of programs with delay declarations has 
been proposed in [14], assuming local selection rules. In our opinion, this 
assumption is unsatisfactory. No implementation of local selection rules is 
mentioned. Local selection rules do not permit any coroutining. But most 
importantly, while “the class of local selection rules [. . . ] supports simple 
tools for proving termination” [14], in practice, it does not seem to make 
programs terminate that would not terminate otherwise. In fact, we can show 
termination for PERMUTE without requiring local selection rules (Ex. 14) . 

5. In spite of point 4, there are programs that crucially rely on the assumption of 
local selection rules for termination. We are only aware of artificial examples, 
but our treatment of local selection rules helps to understand the role this 
assumption plays in proving termination and why this assumption is not 
required for more realistic examples (Ex. 23). 

The rest of this paper is organised as follows. The next section gives some pre- 
liminaries. In Sec. 3, we adapt the semantics approach of [7] to ^-derivations. 
In Sec. 4, we study termination for such derivations. In Sec. 5, we adapt our 
approach to local selection rules. In Sec. 6, we conclude. 

2 Preliminaries 

We assume familiarity with the basic notions and results of logic program- 
ming [1], For m,n € N 0 , m < n, the set {m,,..,n} is denoted by [m..n]. For 
any kind of object that we commonly denote with a certain letter, we use the 
same letter in boldface to denote a finite sequence of such objects [1] . 

We denote by Term and Atom the set of terms and atoms of the language 
in which the programs and queries in question are written. The arity n of a 
predicate symbol p is indicated by writing p/n. We use typewriter font for logical 
variables, e.g. X, Ys, and lower case letters for arbitrary terms, e.g. t,s,xs. 

For any syntactic object o, we denote by Vars(o) the set of variables occurring 
in o. A syntactic object is linear if every variable occurs in it at most once. 

A substitution is a finite mapping from variables to terms. The domain 
(resp., set of variables in the range) of a substitution o is denoted as Dom(a) 
(resp., Ran(o)). We denote by o\ 0 the restriction of a substitution a to Vars(o). 
The result of the application of a substitution o to a term t is called an instance 
of t and it is denoted by to. We say t is a variant of t' if t and t! are instances 
of each other. A substitution o is a unifier of terms t and t' if to = t'o. We 
denote by mgu(t , t') any most general unifier ( mgu ) of t and t' . 
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A query is a finite sequence of atoms Ai,,. . . A m , denoted □ when m = 0. 
A clause is a formula H 4 — B where H is an atom (the head) and B is a 
query (the body). H <— □ is simply written H <— . A program is a finite set 
of clauses. We denote atoms by H, A, B,C, D, E, queries by Q,A,B,C,D,E, 
clauses by c, d and programs by P. We often suppress reference to P. 

If A, B 1 C is a query and c = H B is a fresh variant of a clause, and B 
and H unify with mgu cr, then (A, B, C )a is called a resolvent of A, B. C and 
H <— B with selected atom B and mgu a. We call A, B , C ==4> c (A, B, C)a 
a derivation step, in short: step. If c is irrelevant then we drop the reference. 
A sequence 5 = Qo =>a Qi ==>c 2 • • • is called a derivation of P U {Qo}. 

If S = Qo ==>a • • • ==>c n Q n is a finite derivation, we also denote it as 
^ = Qo Qn where cr = ay • • • a n . We call len(8) = n the length of S. 

2.1 Moded Programs 

For a predicate p/n, a mode is an atom p{m \ , . . . , m n ), where m* G {/, 0} for 
i G [l..n]. Positions with I (O) are called input (output) positions of p. To 
simplify the notation, an atom p(s,t) means: s is the vector of terms filling in 
the input positions, and t is the vector of terms filling in the output positions. 

We assume that the mode of each predicate is unique. One way of ensuring 
this is to rename predicates whenever multiple modes are desired. 

Several notions of “modedness” have been proposed, e.g. nicely-modedness 
and well-modedness [1]. We assume here simply moded programs [2], a special 
case of nicely moded programs. Most practical programs are simply moded [7], 
although we will also give an example of a clause that is not. 

Note that the use of the letters s and t is reversed for clause heads. We 
believe that this notation naturally reflects the data flow within a clause. 

Definition 1. A clause p(t 0 , s n+ i) <— pi(si, ti), . . . ,p n (s n , t„) is simply 
moded if ti, . . . ,t n is a linear vector of variables and for all i G [l..n] 

i 

Vars(ti) n Vars(to) = 0 and Vars(tj) (~l [J Vars(sj) = 0. 

j = 4 

A query B is simply moded if the clause dummy <— B is simply moded. A 
program is simply moded if all of its clauses are simply moded. 

Thus, a clause is simply moded if the output positions of body atoms are filled 
in by distinct variables, and every variable occurring in an output position of a 
body atom does not occur in an earlier input position. 

As an example of a clause that is not simply moded, consider the clause 

reverse ( [X I Xs] ,Ys) <— append(Zs , [X] , Ys) , reverse (Xs , Zs) . 

in mode reverse(0, 1), append(0, 0 , 1): [X] is not a variable. In Ex. 16, we give 
a slightly modified version of the NAIVEJtEVERSE program that is simply moded. 
Robustly typed programs [22] are in some sense a generalisation of simply moded 
programs, and include the above clause. However, the results of this paper have 
so far not been generalised to robustly typed programs. 
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2.2 Norms and Level Mappings 

Proofs of termination usually rely on the notions of norm and level mapping for 
measuring the size of terms and atoms. These concepts were originally defined 
for ground objects [10], but here we define them for arbitrary objects (in [18], we 
call such norms and level mappings generalised) . To show termination of moded 
programs, it is natural to use moded level mappings, where the level of an atom 
depends only on its input positions [7]. 

Definition 2. A norm is a function |.| : Term — > N 0 , and a level mapping 
is a function |.| : Atom — >No, both invariant under renaming. A moded level 
mapping is a level mapping where for any s, t and u, |p(s,t)| = |p(s, u)|. 

An atom A is bounded wrt. the level mapping |.| if there exists k GN such 
that for every substitution cr, we have k > \Aa\. 

Our method of showing termination, following [7], inherently relies on measuring 
the size of atoms that are not bounded. In Def. 13, a decrease in level mapping 
must be shown (also) for such atoms. So it is important to understand that 
stating |A| = k is different from stating that A is bounded by k. 

One commonly used norm is the term size norm, defined as 

\f(ti , • • • ,t n )| = 1 + |ii| + . . . + \t n \ if n > 0, 

\t\ =0 if t constant/ variable. 

Another widely used norm is the list-length function, defined as 

|[*MI = i + M, 

|t| = 0 if t ^ [_|_] (in particular, if t variable). 

For a nil-terminated list [ti, ... ,t n ], the list-length is n. 

2.3 Selection Rules in the Literature 

A selection rule is some rule stating which atom in a query may be selected in 
each step. We do not give any formal definition here; instead we define various 
kinds of derivations and state our formal results in terms of those. 

The notion of input- consuming derivation was introduced in [19] as formalism 
for describing dynamic scheduling in an abstract way. 

Definitions. A derivation step A,p(s,t),C ==> (A,B,C)<r is input- 
consuming if S(j — s. A derivation is input-consuming if all its steps are 
input-consuming. 

Local derivations were treated in [14] . Consider a query, containing atoms A and 
B , in a derivation £. Then A is introduced more recently than B if the step 
introducing A comes after the step introducing B, in £. 

Definition 4. A derivation is local if in each step, there is no more recently 
introduced atom in the current query than the selected atom. 

Intuitively, in a local derivation, once an atom is selected, that atom must be 
resolved away completely before any of its siblings may be selected. 
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3 Input-Consuming ^-Derivations 

We consider derivations restricted by some property V of the selectable atoms. 
There are two conditions on V . Some of our results would hold without these 
conditions, but the conditions are so natural that we do not bother with this. 

Definition 5. A V- derivation is a derivation such that each selected atom is 
in V , where 

1. V is a set of atoms closed under instantiation; 

2. for any s, t and u, p{ s, t)eP implies p( s, u) £ V. 

Note that the atoms of a simply moded query have variables in their output posi- 
tions, and so it would clearly be pathological to require a particular instantiation 
of the output arguments of an atom for that atom to be selected. 

This is the first published work introducing the concept of ^-derivations. 
Of course, a T’-derivation can be qualified further by saying input- consuming 
^-derivation etc. 

Input-consuming ('P-)derivations may end in a query where no atom can be 
selected. This situation is called deadlock. It is a form of termination. 

We now define simply-local substitutions, which reflect the way simply moded 
clauses become instantiated in input-consuming derivations [7]. Given a clause 
c = p(to, s n _|_i) <— pi(si, ti), . . . ,p n (s n , t„), first to becomes instantiated, and the 
range of that substitution contains only variables from outside of c. Then, by 
resolving pi(si,ti), ti becomes instantiated, and the range of that substitution 
contains variables from outside of c and from si. Continuing in the same way, 
finally, t„ becomes instantiated, and the range of that substitution contains 
variables from outside of c and from si . . . s„. 

Definition 6. The substitution a is simply-local wrt. the clause c = 
p(t 0 ,s„ + i) <— pi(si,ti), . . . , p n ( s n ,t„) if there exist substitutions cro,eri . . . ,ct„ 
and disjoint sets Vo,Vi, . . . ,V n consisting of fresh (wrt. c) variables such that 
<j = ctqcji ■ ■ ■ (J n , where for i £ [0..n], 

— Dom(ai) C Vars(ti), 

— Ran(<Ji) C Vars(s i <Tocri • • • (Ti- 1 ) Li Vj x . 

a is simply-local wrt. a query B if a is simply-local wrt. the clause dummy <— B. 

In the case of a simply-local substitution wrt. a query, cio is the identity. 

Example 7. Consider DELETE in Fig. 1, with mode delete(7, O, O). The sub- 
stitution a = {Y/V, Zs/[W], Xs/[], X/W} is simply-local wrt. the recursive clause: 
let do = {Y/V, Zs/[W]}, ay = {X/W, X s / [] } , and 02 = 0; then Dom(ao) C { Y, Zs}, 
Ran(ao) C Vq where Vo = {V,W}, Dom(ai) C {Xs,X}, Ran(ai) C Vars( Zsoo). 

1 Note that so is undefined. By abuse of notation, Vars(so . . .) = 0. 
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permute ([],[]). 
permute (Ys , [X I Xs] ) 
delete(Ys,Zs,X) , 
permute (Zs,Xs) . 

Fig. 1. PERMUTE (DELETE) 



delete ( [X| Xs] ,Xs,X) . 
delete ( [Y I Zs] , [Y I Xs] , X) <- 
delete (Zs,Xs,X) , call_late. 
call_late . 



We can safely assume that all the mgu’s employed in an input-consuming deriva- 
tion of a simply modecl program with a simply moded query are simply-local, 
that is to say: if A, B, C =4> (A, B, C)a is an input-consuming step using clause 
c = 7L <— B, then cr = a§<J\ and <7o(= <j\ h) is simply-local wrt. the clause H 4 — 
and a 1 (— <T | B ) is simply-local wrt. the atom 2 B [7, Lemma 3.8]. This assumption 
is crucial in the proofs of the results of this paper. 

In [7], a particular notion of model is defined, which reflects the substitutions 
that can be computed by input-consuming derivations. According to this notion, 
a model is a set of not necessarily ground atoms. Here, we generalise this notion 
so that it reflects the substitutions that can be computed by input-consuming 
^-derivations. This generalisation is crucial for the results in Subsec. 4.3. 

Definition 8. Let M C Atom. We say that M is a simply-local "P-model of 
c — H < — B 1 , . . . , B n if for every substitution cr simply-local wrt. c, 

if B\a , . . . , B n a £ M and Ha £ V then Ha £ M. (1) 

M is a simply-local "P-model of a program P if M is a simply-local "P-model 
of each clause of P. 

We denote the set of all simply moded atoms 2 for the program P by SMp. 

Least simply-local 'P-models, possibly containing SMp , can be computed by 
a variant of the well-known Tp - operator [7]. 

Definition 9. Given a program P and I C Atom, we define 

Tp lv (I) = {Ha | 3 c = H 4 — Bi , . . . , B n variant of a clause in P, 

a is simply-local wrt. c, B\a ,. . . , B n a £ I, Ha £V}, 
T^ lv (I) = I U Tp V (/). 

We denote the least simply-local "P-model of P containing SM P by PMp LV . 

Example 10. Consider the program DELETE (see Fig. 1) ignoring the call_late 
predicate. Recall Ex. 7. Let V be the set containing all atoms using delete. 
SMp consists of all atoms of the form deleters, Us, U) where Us,U ^ Vars(vs). 
To construct PMp LV , we iterate Tp LV starting from any atom in SMp (the 
resulting atoms are written on the l.h.s. below) and the fact clause (r.h.s.). Each 
line below corresponds to one iteration of Tp LV . We have PMp LV = 



2 We sometimes say “atom” for “query containing only one atom” . 




50 



Jan-Georg Smaus 



{ delete (vs, Us, U), 

delete([yi|t;s], [yi|Us],U), delete([xi |xsi], xs\, x\), 

delete([j/ 2 ,yi|us], [j/ 2 ,J/i|Us],U), delete([yi, xi |xsi], [yi |xsi], xi), (2) 

| vs, xsi, xi, yi, x/ 2 , ••• arbitrary where Us, U ^ Vars(vs)}. 

Observe the variable occurrences of U, Us in the atoms on the l.h.s. In Ex. 14, we 
will see the importance of such variable occurrences. 

In the above example, we assume that P is the set of all atoms, and so the simply- 
local P-model is in fact a simply-local model [7]. In order to obtain a necessary 
termination criterion, the approach of [7] required some small modifications in 
many places, one of them being the generalisation of simply-local models to 
simply-local P-models. However, we are not aware of a practical situation where 
one has to consider a simply-local P-model that is not a simply-local model. 

The model semantics given here is equivalent to the operational semantics. 
We do not formally state this equivalence here for lack of space, but it is used 
in the proofs of the termination results of the following sections [21]. 

4 Termination Without Requiring Local Selection Rules 

4.1 Simply P-Accceptable Programs 

The following concept is adopted from Apt [1] . 

Definition 11. Let p, q be predicates in a program P. We say that p refers to 
q if there is a clause in P with p in its head and q in its body, and p depends 
on q (written p □ q) if (p, q) is in the reflexive, transitive closure of refers to. 
We write p □ q if p □ q and q^ p, and p ~ q if p H q and g □ p. 

We extend this notation to atoms, e.g. p( s, t) ~ g(u, v) if p ~ q. 

Definition 12. A program is input P-terminating if all input-consuming P- 
derivations starting in a simply-moded query are finite. 

Previously, we had defined input termination, which is input P-termination for 
P being the set of all atoms [7]. We now give a sufficient and necessary criterion 
for input P-termination. 

Definition 13. Let P be a simply moded program, |.| a moded level mapping 
and M a simply-local P-model of P containing SMp. A clause H <— B\ , . . . , B n 
is simply P-acceptable by | . | and M if for every substitution a simply-local 
wrt. it, for all i € [l..n], 

Bier, . . . , Bi_\o € M and H ~ Bi and He r G P and Btc r G P imply \Ha\ > 

(3) 

The program P is simply P-acceptable by |.| and M if each clause of P is 
simply P-acceptable by |.| and M. 
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The difference to the definition of a simply acceptable clause [7] is in the con- 
ditions Bid £ V and Ha £ V. The condition Bid £ V may seem more natural 
than the condition Ha £ V , since cr, due to the model condition, reflects the 
degree of instantiation that Bi may have when it is selected. But for i = 1, a 
reflects the degree of instantiation of the entire clause obtained by unification 
when the clause is used for an input-consuming ^-derivation step. Moreover, the 
condition Ha £ V is important for showing necessity (Subsec. 4.3). 

Note that a decrease between the head and body atoms must be shown only 
for the atoms where H ~ Bi. The idea is that termination is shown incrementally, 
so we assume that for the Bi where H □ B, , termination has been shown already. 
One can go further and explicitly give modular termination results [7, 10], but 
this is a side issue for us and we refrain from it for space reasons. 

The following is the standard example of a program that requires bounded- 
ness as additional condition on selected atoms (see point 2 in the introduction) . 

Example 14- Consider PERMUTE in mode permute(7, O), delete(7, O , O) 
(Fig. 1). Recall Ex. 10. As norm we take the list-length function, and we de- 
fine the level mapping as |permute( 0 s, xs)| = |zs| and (deleters, zs, x) \ = |a:s|. 
Now for all atoms deleters, zs, x) £ PMp LV , we have |ys| > | 2 s|; for the ones 
on the r.h.s. even |j/s| > |zs|. Let V be the set of bounded atoms wrt. |.|. 

Now let us look at the recursive clause for permute. We verify that the second 
body atom fulfils the requirement of Def. 13, where M is PM p LV . So we have to 
consider all simply-local substitutions a such that delete(Ys, Zs, X)cr £ PMp LV . 
For the atoms on the l.lr.s. in (2), this means that 

o’ 2 {Ys /[y n , . . . , j/i|us],Zs/[y n , . . • , 2 /i|Us],X/U} (n > 0). 

Clearly, permute(Zs, Xs)cr ^ V , and hence no proof obligation arises. For the 
atoms on the r.h.s. in (2), this means that 

o’ 2 {Ys/[y n , . . . ,yi,xi\xsi],Zs/[y n ,. . . ,yi\xsi],X/xi} (n > 0). 

But then |permute(Ys, [X|Xs])er| > |permute(Zs, Xs)er|. 

The other clauses are trivial to check, and so PERMUTE is simply P-acceptable. 
Observe that only the model of DELETE played a role in our argument, not the 
model of PERMUTE. 

The atom call .late only serves the purpose of allowing for non-local deriva- 
tions, to emphasise that locality is not needed for termination (see point 4 in 
Sec. 1). Without this atom, all ^-derivations would automatically be local. 

The following infinite derivation (ignoring call.late), input-consuming but 
not a ^-derivation, demonstrates that the program does not input-terminate: 

permuteQl], W) => delete([l], Zs', X'), permute(Zs', Xs') => 
delete([], Xs", X'), permute([l|Xs"], Xs') => 

delete([], Xs", X'), delete([l|Xs"], Zs"', X'"), permute(Zs ,,, , Xs') => 
delete([], Xs", X'), delete(Xs", Xs"", X"'), permute([l|Xs""], Xs') => ... 
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reverse ( [X | Xs] ,Ys) <— 
append_sing(Zs,X,Ys) , 
reverse(Xs,Zs) . 
reverse ([],[]). 



append_sing( [X I Xs] , Y, [X I Zs] ) 
append_sing(Xs,Y,Zs) . 
append_sing( [] ,Y, [Y] ) . 



Fig. 2. NAIVE .REVERSE 
4.2 Sufficiency of Simply V- Acceptability 

In [7], we find a result stating that simply acceptable programs are input ter- 
minating. The following theorem generalises this result to ^-derivations. The 
proofs of all results of this paper can be found in [21]. 

Theorem 15. Let P be a simply moded program. Let M be a simply-local V- 
model of P containing SMp. Suppose that P is simply "P-acceptable by M and 
a moded level mapping |.|. Then P is input ^-terminating. 

We give three further examples. The first supports point 2 in the introduction. 

Example 16. The program NAIVEJtEVERSE (Fig. 2) in mode revers e(0,I), 
append_sing(0, 0,1) is not input terminating, but it is input ^-terminating 
for V chosen in analogy to Ex. 14. 

The next example illustrates point 1 in the introduction. 

Example 1 1. Let PERMUTE2 be the program obtained from PERMUTE by replacing 
the recursive clause for delete by its most specific variant [17]: 

delete! [Y, HIT] , [Y I Xs] ,X) <-delete( [H |T] ,Xs,X) . 

Assume |.| and the modes as in Ex. 14. As in Ex. 10 we have 

SMp = {deleters, Us, U) | vs arbitrary}, 

but when we start applying Tp LV to the atoms in SMp, then due to the mod- 
ified clause above, only the atoms of the form delete([u|us / ], Us, U) contribute; 
delete([], Us, U) does not contribute: 

PMp LV = {deleters, Us, U), 

delete([y 1 ,v,vs'\, [j/i|Us],U), deleteQau |zsi], xsi, Xi), 

delete([j/ 2 , yi,v\vs'], [y 2 , yi |Us], U), delete([yi, xi |ar«i], [yilxsx], Xi), 

| vs, v, vs' , xs\,xi,yi, y 2 , ■ ■ ■ arbitrary where Us, U ^ Vars{vs, v, us')}. 

We show that the program is simply ^-acceptable by | . | and PM p LV , where V 
is the set of atoms that are at least non- variable in their input positions. As in 
Ex. 14, we focus on the second body atom of the recursive clause for permute. We 
have to consider all simply-local substitutions a such that delete(Ys, Zs, X)<r £ 
PMp LV , and moreover permute(Zs, Xs)cr £ V. It is easy to see that for all such 
a, we have |permute(Ys, [X|Xs])<r| > |permute(Zs, Xs)<r|. The important point is 
that the atoms of the form deleters, Us, U) £ PMp LV do not give rise to a 
proof obligation since permute(Us, _) V. 




Termination of Logic Programs Using Various Dynamic Selection Rules 



53 



The following is an example of “hybrid” selection rules (point 3 in Sec. 1). 

Example 18. For space reasons, we only sketch this example. A program for the 
well-known n-queens problem has the following main clause: 

nqueens(N,Sol) <— 

sequence (N , Seq) , permute (Seq, Sol) , safe (Sol). 

We could implement permute as in Ex. 14 or as in Ex. 17. In either case, we 
have a non-trivial V . In contrast, V may contain all atoms using safe. In fact, 
for efficiency reasons, atoms using safe should be selected as early as possible. 

Note that such a hybrid selection rule can be implemented by means of the 
default left-to-right selection rule [22]. To this end, the second and third atoms 
must be swapped. Since any results in this paper do not actually depend on the 
textual position of atoms, they still apply to the thus modified program. 

4.3 Necessity of Simply V - Acceptability 

We now give the converse of Theorem 15, namely that our criterion for proving 
input ^-termination wrt. simply moded queries is also necessary. The level map- 
ping is constructed as a kind of tree that reflects all possible input-consuming 
^-derivations, following the approach of [7] which in turn is based on [5]. But 
for space reasons, we only state the main result. 

Theorem 19. Let P be a simply moded program and V a set of atoms accord- 
ing to Def. 5. If P is input ^-terminating then P is "P-simply acceptable. In 
particular, P is "P-simply acceptable by PMp LV . 

5 Local Selection Rules 

In this section, we adapt the results of the two previous sections to local selection 
rules. First note that local derivations genuinely need special treatment, since 
one cannot express locality as a property V of the selected atoms. Note also that 
local [14] and simply-local [6, 7] are completely different concepts. 

Assuming local selection rules is helpful for showing termination, since one 
can exploit model information almost in the same way as for LD derivations [4] . 
In fact, some arguments are simpler here than in the previous two sections, 
manifest in the proofs [21]. However, this is also due to the fact that we currently 
just have a sufficient termination criterion for local derivations. How this criterion 
must be adapted to become also necessary is a topic for future work. 

A simply-local model [7] is a simply-local "P-model where V is the set of all 
atoms. Analogously, we write PMp L instead of PMp LV in this case. To reflect 
the substitutions that can be computed by local derivations, we need as model 
the union of a simply-local model (for the completely resolved atoms) and the 
set of simply moded atoms (for the unresolved atoms) . 

Let M be the least simply-local model of P (note: not the least simply-local 
model of P containing SMp ) We define LMp L := SMp U M. So LMp L contains 
SMp , but unlike PMp L , does not involve applications of Tp L to atoms in SMp. 
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Example 20. Let P be the following program in mode even(/), minus2(7, O ): 

even(X) <— minus2(X,Y), even(Y) . minus2(X,s(X)) 4 — fail. 

even(O) . minus2 (s (s (X) ) ,X) . 

We have 

SMp = {even(s), minus2(a:, Z), fail | x arbitrary where Z ^ Vars(x )} 
PMp L = SMp U {even(O), minus2(s(s(x)), x), minus2(:r, s(a:)), 
even(s(s(a:))), even(x) | x arbitrary}. 

The minimal simply-local model of P, not containing SMp, is the following: 

M = }even(s 2n (0)), minus2(s(s(a:)), x) | n € No, x arbitrary}. 

Then LMp L = SMpUM. In contrast to PMp L we have minus2(:r, s(:r)) ^ LMp L . 
This reflects that in a local derivation, resolving an atom with the clause head 
minus2(X, s(X)) will definitely lead to finite failure. For this program, locality is 
crucial for termination (see point 5 in Sec. 1). 

The example 3 is contrived since the first clause for minus 2 is completely 
unnecessary, yet natural enough to suggest that there might be a “real” example. 

We now proceed with the treatment of termination. 

Definition 21. A program is local P-terminating if all input-consuming local 
P-derivations starting in a simply-moded query are finite. 

We now give a sufficient criterion for local P-termination. 

Definition 22. Let P be a simply moded program, |.| a moded level mapping 
and M a set such that SMp C M and for some simply-local model M' of P, 
M' C M. A clause A 4 — B\ , ... ,B n is local P-acceptable by |.| and A / 4 if 
for every substitution <7 simply-local wrt. it, for all i G [l..n], 

(Pi, . . . , P;_i)<r € M and A ~ Bi and P^cr € P implies \Aa\ > |P;cr|. 

The program P is local P-acceptable by |.| and M if each clause of P is local 
P-acceptable by |.| and M. 

Example 23. Consider again the program in Ex. 20, in particular the recursive 
clause. Let P be the set of atoms where all input arguments are non- variable and 
|even(a;)| = |minus2(a:, y)\ = \x\ where |.| is the term size norm. We verify that 
the second body atom fulfils the requirement of Def. 22, taking M = LMp L . We 
have to consider all simply-local a such that minus2(X, Y)cr e LMp L . So 

it3{X/i,Y/Z} or a D {X/s(s(x)), Y/x}. 

3 Thanks to Felix Klaedtke for inspiring the example! 

4 This terminology should be regarded as provisional. If a sufficient and necessary con- 
dition for local P-termination different from the one given here is found eventually, 
then it should be called “local P-acceptable” rather than inventing a new name. 




Termination of Logic Programs Using Various Dynamic Selection Rules 



55 



In the first case, even(Y)er ^ V and hence no proof obligation arises. In the second 
case, |even(X)(j| > |even(Y)cr|. Hence the clause is local ^-acceptable. Note that 
the clause is not simply P-acceptable (due to minus2(a:, s(a:)) € PMp L ). 

Observe that unlike [14], we do not require that the selected atoms must be 
bounded. In our formalism, the instantiation requirements of the selected atom 
and the locality issue are two separate dimensions. 

Theorem 24. Let P be a simply moded program. Let M be a set such that 
SMp C M and for some simply-local model M' of P, M' C M. Suppose that 
P is local P-acceptable by M and a moded level mapping |.|. Then P is local 
P-terminating. 



6 Conclusion 

We have presented a framework for proving termination of logic programs with 
dynamic scheduling. We have considered various assumptions about the selection 
rule, in addition to the assumption that derivations must be input-consuming. 
On the one hand, derivations can be restricted by giving a property V that the 
selected atoms must fulfil. On the other hand, derivations may or may not be 
required to be local. These aspects can be combined freely. We now refer back 
to the five points in the introduction. 

Some programs terminate under an assumption about the selection rule 
that is just slightly stronger than assuming input-consuming derivations (point 
1). Others need what we call strong assumptions: the selected atom must be 
bounded wrt. a level mapping (point 2). Different versions of PERMUTE, which is 
the standard example of a program that tends to loop for dynamic scheduling 
[17], are representatives of these program classes. Then there are programs for 
which one should make hybrid assumptions about the selection rule: depend- 
ing on the predicate, an atom should be bounded in its input positions or not 
(point 3). Considering our work together with [7], it is no longer true that “the 
termination behaviour of ‘delay until nonvar’ is poorly understood” [14]. 

The authors of [14] have assumed local selection rules. There are programs 
for which this assumption is genuinely needed. Abstractly, this is the case for a 
query A \ , . . . , A n where for some atom A; and some clause c, the subderivations 
associated with A^ and c all fail, but at the same time, the unification between A.; 
and c’s head produces a substitution that may trigger an infinite derivation for 
some atom Aj , where j > i. In this case, locality ensures failure of A^ before the 
infinite derivation of Aj can happen. The comparison between our model notions 
(see Ex. 20) also clarifies the role of locality: substitutions obtained by partial 
resolution of an atom can be disregarded (point 5). But we are not aware of a 
realistic program where this matters (point 4). As an obvious consequence, we 
have no realistic program that we can show to terminate for local derivations and 
the method of [14] cannot. But the better understanding of the role of locality 
may direct the search for such an example. 
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For derivations that are not assumed to be local, we obtain a sufficient and 
necessary termination criterion. For local derivations, our criterion is sufficient 
but probably not necessary. Finding a necessary criterion is a topic for future 
work. This would be an important advance over [14], since the criterion given 
there is known not to be necessary. 

The concepts of input- consuming derivations and ^-derivations are both 
meant to be abstract descriptions of dynamic scheduling. Delay declarations 
that check for arguments being at least non- variable, or at least non- variable 
in some sub-argument [12,23], are often adequate for ensuring input-consuming 
derivations with V stating that the input arguments are at least non-variable 
(see Ex. 17). Delay declarations that check for groundness are adequate for en- 
suring boundedness of atoms (see Ex. 14). In general groundness is stronger 
than boundedness, but we are not aware of delay declarations that could check 
for boundedness, e.g., check for a list being nil-terminated. This deficiency has 
been mentioned previously [13]. Hybrid selection rules can be realised with delay 
declarations combined with the default left-to-right selection rule (see Ex. 18). 

Concerning automation of our method, the problems are not so different from 
the ones encountered when proving left-termination: we have to reason about 
infinite models — to do so, abstract interpretation approaches, where terms 
are abstracted as their norms, may be useful [11,16]. It seems that in our case 
automation is additionally complicated because we have to consider infinitely 
many simply-local substitutions. But looking at Ex. 10, we have terms y\, y%, . . . 
that are arbitrary and whose form does not affect the termination problem. 
Hence it may be sufficient to consider most general substitutions in applications 
of Tp LV . 

Another topic for future work is, of course, a practical evaluation, looking 
at a larger program suite. In this context, it would be desirable to infer a V, as 
unrestrictive as possible, automatically. Also, we should consider the following 
issues: (1) possible generalisations of the results in Sec. 5, leaving aside the as- 
sumption of input-consuming derivations; (2) a termination criterion that would 
capture programs that terminate for certain (intended) queries, but not for all 
queries; (3) relaxing the condition of simply moded programs. 
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Abstract. Refactoring is an established technique from the OO-community to 
restructure code: it aims at improving software readability, maintainability and 
extensibility. Although refactoring is not tied to the OO-paradigm in particular, 
its ideas have not been applied to Logic Programming until now. 

This paper applies the ideas of refactoring to Prolog programs. A catalogue is pre- 
sented listing refactorings classified according to scope. Some of the refactorings 
have been adapted from the OO-paradigm, while others have been specifically 
designed for Prolog. Also the discrepancy between intended and operational se- 
mantics in Prolog is addressed by some of the refactorings. 

In addition, ViPReSS, a semi-automatic refactoring browser, is discussed and the 
experience with applying ViPReSS to a large Prolog legacy system is reported. 
Our main conclusion is that refactoring is not only a viable technique in Prolog 
but also a rather desirable one. 



1 Introduction 

Program changes take up a substantial part of the entire programming effort. Often 
changes are required to incorporate additional functionality or to improve efficiency. In 
both cases, a preliminary step of improving the design without altering the external be- 
haviour is recommended. This methodology, called refactoring, emerged from a num- 
ber of pioneer results in the OO-community [6, 13, 15] and recently came to prominence 
for functional languages [11]. More formally, refactoring is a source-to-source program 
transformation that changes program structure and organisation, but not program func- 
tionality. The major aim of refactoring is to improve readability, maintainability and 
extensibility of the existing software. While performance improvement is not consid- 
ered as a crucial issue for refactoring, it can be noted that well- structured software is 
more amenable to performance tuning. We also observe that certain techniques that 
were developed in the context of program optimisation, such as dead-code elimination 
and redundant argument filtering, can improve program organisation and, hence, can 
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be considered refactoring techniques. In this paper we discuss additional refactoring 
techniques for Prolog programs. 

To achieve the above goals two questions need to be answered: where and how trans- 
formations need to be performed. Unlike automated program transformations, neither 
of the steps aims at transforming the program fully automatically. The decision whether 
to transform is left to the program developer. However, providing automated support 
for refactoring is useful and an important challenge. 

Deciding automatically where to apply a transformation can be a difficult task on 
its own. Several ways to resolve this may be considered. First, program analysis ap- 
proaches can be used. For example, it is common practice while ordering predicate 
arguments to start with the input arguments and end with the output arguments. Mode 
information can be used to detect when this rule is violated and to suggest the user to 
reorder the arguments. Second, machine learning techniques can be used to predict fur- 
ther refactorings based on those already applied. Useful sequences of refactoring steps 
can be learned analogously to automated macro construction [9], Following these ap- 
proaches, automatic refactoring tools, so called refactoring browsers , can be expected 
to make suggestions on where refactoring transformations should be applied. These 
suggestions can then be either confirmed or rejected by the program developer. 

Answering how the program should be transformed might also require the user’s in- 
put. Consider for example a refactoring that renames a predicate: while automatic tools 
can hardly be expected to guess the new predicate name, they should be able to detect 
all program points affected by the change. Other refactorings require certain properties, 
like as absence of user-defined meta-predicates, that cannot be easily inferred. It is then 
up to the user to evaluate whether the properties hold. 

The outline of this paper is as follows. We first illustrate the use of several refactor- 
ing techniques on a small example in Section 2. Then a more comprehensive catalogue 
of Prolog refactorings is given in Section 3. In Section 4 we introduce ViPReSS , our 
refactoring browser, currently implementing most of the refactorings of the catalogue. 
ViPReSS has been successfully applied for refactoring a 50,000 lines-long legacy sys- 
tem. Finally, in Section 5 we conclude. 

2 Detailed Prolog Refactoring Example 

We illustrate some of the techniques proposed by a detailed refactoring example. Con- 
sider the following code fragment borrowed from O’ Keefe’s “The Craft of Prolog” [12], 
p. 195. It describes three operations on a reader data structure used to sequentially read 
terms from a file. The three operations are make_reader/3 to initialise the data struc- 
ture, reader_done/l to check whether no more terms can be read and reader_next/3 
to get the next term and advance the reader. 

O'Keefe's original version 

make_reader (File, Stream, State) : - 
open (File, read, Stream) , 
read (Stream, Term) , 
reader_code (Term, Stream, State) . 

reader_code (end_of_f ile,_, end_of_f ile) ! . 
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reader_code (Term, Stream, read (Term, Stream, Position) ) : - 
stream_position (Stream, Position) . 

reader_done (end_of_f ile) . 

reader_next (Term, read (Term, Stream, Pos) , State) ) : - 
stream_position (Stream, _, Pos) , 
read (Stream, Next) , 
reader_code (Next, Stream, State) . 



We will now apply several refactorings to the above program to improve its read- 
ability. 

First of all, we use if-then-else introduction to get rid of the ugly red cut in the 
reader_code/3 predicate: 

Replace cut by if-then-else 

reader_code (Term, Stream, State) : - 
(_ Term = end_of_file. 

State = end_of_file -> 
true 

State = read (Term, Stream, Position) , 
stream_position (Stream, Position) 

) . 



This automatic transformation reveals two malpractices, the first of which is pro- 
ducing output before the commit, something O’Keefe himself disapproves of (p. 97). 
This is fixed manually to: 

Output after commit 

reader_code (Term, Stream, State) : - 
( Term = end_of_file -> 

State = end_of_file 

State = read (Term, Stream, Position) , 
stream_position (Stream, Position) 



The second malpractice is a unification in the condition of the if-then-else where 
actually an equality test is meant. Consider that the Term argument is a variable. Then 
the binding is certainly unwanted behaviour. Manual change generates the following 
code: 

Equality test 

reader_code (Term, Stream, State) : - 
( Term == end_of_file -> 

State = end_of_file 

State = read (Term, Stream, Position) , 
stream_position (Stream, Position) 

) . 
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Next, we notice that the sequence read/2 , reader_code/3 occurs twice, either by 
simple observation or by computing common body subsequences. By applying predi- 
cate extraction of this common sequence, we get: 

Predicate extraction 

make_reader (File, Stream, State) : - 
open (File, read, Stream) , 
read_next_state (Stream, State) ■ 

reader_next (Term, read (Term, Stream, Pos) , State) ) : - 
stream_position (Stream, _, Pos) , 
read_next_state (Stream, State) , 

read_next_state (Stream, State) 
read (Stream, Term) , 
reader_code (Term, Stream, State) . 



Next we apply O’Keefe’s own principle of putting the input argument first and the 
output arguments last (p. 14-15): 

Argument reordering 

reader_next (read (Term, Stream, Pos) , Term, State) : - 
stream_position (Stream, _, Pos) , 
read_next_code (Stream, State) . 



Finally, we introduce less confusing and overlapping names for the read/3 functor, 
the stream_position/ [2,3] built-ins and a more consistent naming formake_reader, 
more in line with the other two predicates in the interface. O’Keefe stresses the impor- 
tance of consistent naming conventions (p. 213). 

Note that direct renaming of built-ins such as stream_position is not possible, but 
a similar effect can be achieved by extracting the built-in into a new predicate with the 
desired name. 

Renaming 

reader_init (File, Stream, State) 
open (File, read, Stream) , 
reader_next_state (Stream, State) . 

reader_next ( reader (Term, Stream, Pos) ,Term, State) ) : - 
set_stream_position (Stream, Pos) , 
reader_next_state (Stream, State) . 

reader_done (end_of_f ile) . 

reader_next_state (Stream, State) : - 
read (Stream, Term) , 

build_reader_state (Term, Stream, State) . 

build_reader_state (Term, Stream, State) : - 
( Term == end_of_file -> 

State = end_of_file 
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State = reader (Term, Stream, Position) , 
get_stream_position (Stream, Position) 

) . 

set stream position (Stream, Position) : - 
stream__position (Stream, _, Position) „ 
get stream position (Stream, Position) : - 
stream__position (Stream, Position) . 



While the above changes can be performed manually, a refactoring browser such 
as ViPReSS (see Section 4) guarantees consistency, correctness and furthermore can 
automatically single out opportunities for refactoring. 

3 Comprehensive Catalogue of Prolog Refactorings 

In this section we present a number of refactorings that we have found to be useful when 
Prolog programs are considered. A more comprehensive discussion of the presented 
refactorings can be found in [16]. 

We stress that the programs are not limited to pure logic programs, but may contain 
various built- ins such as those defined in the ISO standard [2]. The only exception are 
higher-order constructs that are not dealt with automatically, but manually. Automating 
the detection and handling of higher-order predicates is an important part of future 
work. 

The refactorings in this catalogue are grouped by scope. The scope expresses the 
user-selected target of a particular refactoring. While the particular refactoring may af- 
fect code outside the selected scope, it is only because the refactoring operation detects 
a dependency outside the scope. 

For Prolog programs we distinguish the following four scopes, based on the code 
units of Prolog: system scope (Section 3.1), module scope (Section 3.2), predicate scope 
(Section 3.3) and clause scope (Section 3.4). 

3.1 System Scope Refactorings 

The system scope encompasses the entire code base. Hence the user does not want to 
transform a particular subpart, but to affect the system as a whole. 



Extract Common Code into Predicates. This refactoring looks for common function- 
ality across the system and extracts it into new predicates. The common functionality 
consists of subsequences of goals that are called in different predicate bodies. By re- 
placing these common subsequences with calls to new predicates the overall readability 
of the program improves. Moreover the increased sharing simplifies maintenance as 
now only one copy needs to be modified. User input is required to decide what com- 
mon sequences form meaningful new predicates. Finding the common sequences and 
the actual replacing are handled automatically by ViPReSS. 
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Hide Predicates. This refactoring removes export declarations for predicates that are 
not imported in any other module. User input is required to confirm that a particular 
predicate is not meant for use outside the module in the future. This refactoring sim- 
plifies the program by reducing the number of entry points into modules and hence the 
intermodule dependencies. 

Remove Dead Code. Dead code elimination is sometimes performed in compilers for 
efficiency reasons, but it is also useful for developers: dead code clutters the program. 

We consider a predicate definition in its entirety as a code unit that can be dead, as 
opposed to a subset of clauses. While eliminating a subset of clauses can change the 
semantics of the predicate and hence lead to an erroneous use, this is not the case if the 
entire predicate is removed. 

It is well-known that reachability of a certain program point (predicate) is, in gen- 
eral, undecidable. However, one can safely approximate the dead code by inspecting 
the predicate dependency graph (PDG) of the system. The PDG connects definitions 
of predicates to the predicates that use them in their own definition. This graph is use- 
ful for other refactorings, like remove redundant arguments. In the system one or more 
predicates should be declared as top-level predicates that are called in top-level queries 
and form the main entry points of the system. Now dead predicates are those predicates 
not reachable from any of the top-level predicates in the PDG. 

User input is necessary whether a predicate can safely be removed or should stay 
because of some intended future use. 

In addition to unused predicate definitions, redundant predicate import declarations 
should also be removed. This may enable the hide predicate refactoring to hide more 
predicates. Dead-code elimination is supported by ViPReSS. 

Remove Duplicate Predicates. Predicate duplication or cloning is a well-known prob- 
lem. One of the prominent causes is the practice known as “copy and paste”. Another 
cause is unawareness of available libraries and exported predicates in other modules. 
The main problem with this duplicate code is its bad maintainability. Changes to the 
code need to be applied to all copies. 

Looking for all possible duplications can be quite expensive. In practice in ViPReSS 
we limit the number of possibilities by only considering predicates with identical names 
in different modules as possible duplicates. The search proceeds stratum per stratum up- 
wards in the stratified PDG. In each stratum the strongly connected components (SCCs) 
are compared with each other. If all the predicate definitions in an SCC are identical to 
those in the other component and they depend on duplicate components in lower strata, 
then they are considered duplicates as well. 

It is up to the user to decide whether to throw away some of the duplicates or replace 
all the duplicate predicates by a shared version in a new module. 

Remove Redundant Arguments. The basic intuition here is that parameters that are 
no longer used by a predicate should be dropped. This problem has been studied, among 
others, by Leuschel and Sorensen [10] in the context of program specialisation. They 
established that the redundancy property is undecidable and suggested two techniques 
to find safe and effective approximations: top-down goal-oriented RAF and bottom-up 
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goal-independent FAR. In the context of refactoring FAR is the more useful technique. 
Firstly, FAR is the only possibility if exported predicates are considered. Secondly, 
refactoring-based software development regards the development process as a sequence 
of small “change - refactor - test” steps. These changes most probably will be local. 
Flence, FAR is the technique applied in ViPReSS. 

FAR marks an argument position in the head of the clause as unused if it is occu- 
pied by a variable that appears exactly once in the argument position that has not been 
marked as unused. The marking process proceeds bottom-up per strongly connected 
component (SCC) of the predicate dependency graph. 

The argument-removing technique should consist of two steps. First, unused argu- 
ment positions are marked by FAR. Second, depending on user input, marked argument 
positions are dropped. Similarly to removing unused predicates (dead code elimination) 
by removing unused argument positions from predicates we improve readability of the 
existing code. 

Rename Functor. This refactoring renames a term functor across the system. If the 
functor has several different meanings and only one should be renamed, it is up to 
the user to identify what use corresponds with what meaning. In a typed language, a 
meaning would correspond with a type and the distinction could be made automatically. 
Alternatively, type information can be inferred and the renaming can be based on it. 

3.2 Module Scope Refactorings 

The module scope considers a particular module. Usually a module is implementing a 
well-defined functionality and is typically contained in one file. 

Merge Modules. Merging a number of modules in one can be advantageous in case of 
strong interdependency of the modules involved. Refactoring browsers are expected to 
discover interrelated modules by taking software metrics such as the number of mutu- 
ally imported predicates into account. Upon user confirmation the actual transformation 
can be performed. 

Remove Dead Code Intra-module. Similar to dead code removal for an entire system 
(see Section 3.1), this refactoring works at the level of a single module. It is useful for 
incomplete systems or library modules with an unknown number of uses. The set of top 
level predicates is extended with, or replaced by, the exported predicates of the module. 

Rename Module. This refactoring applies when the name of the module no longer 
corresponds to the functionality it implements, e.g. due to other refactorings. It also 
involves updating import statements in the modules that depend on the module. 

Split Module. This refactoring is the opposite of Merge Modules. By splitting a large 
module into separate modules, the code units become more manageable. Moreover, it is 
easier to reuse a particular functionality if it is contained in a separate module. Similarly 
to the previous refactoring, this one involves updating dependent import statements. 
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3.3 Predicate Scope Refactorings 

The predicate scope targets a single predicate. The code that depends on the predicate 
may need updating as well. But this is considered an implication of the refactoring of 
which either the user is alerted or the necessary transformations are performed implic- 
itly. 

Add Argument. This refactoring should be applied when a callee needs more infor- 
mation from its (direct or indirect) caller. Our experience suggests that the situation is 
very common while developing Prolog programs. It can be illustrated by the following 
example: 

Original Code 

compiler (Program, CompiledCode) : - 

translate (Program, Translated) , 
optimise (Translated, CompiledCode) . 

optimise ( [assignment (Var,Expr) | Statements] , CompiledCode) 
optimise_assignment (Expr, OptimisedExpr) , ... 

optimise) [if (Test, Then, Else) | Statements] , CompiledCode) 
optimise_test (Test,OptimisedTest) , ... 

optimise_test (Test,OptimisedTest) ... 



Assume that a new analysis (analyse) of if-conditions has been implemented. Since 
this analysis requires the original program code as an input, the only place to plug the 
call to analyse is in the body of compiler: 

Extended Code 

compiler (Program, CompiledCode) : - 

analyse (Program, AnalysisResults) , 
translate (Program, Translated) , 
optimise (Translated, CompiledCode) . 



In order to profit from the results of analyse the variable AnalysisResults should 
be passed all the way down to optimise_test. In other words, an extra argument 
should be added to optimise and optimise_test and its value should be initialised to 
AnalysisResults. 

Hence, given a variable in the body of the caller and the name of the callee, the 
refactoring browser should propagate this variable along all possible computation paths 
from the caller to the callee. This refactoring is an important preliminary step preceding 
additional functionality integration or efficiency improvement. 

Move Predicate. This refactoring corresponds to the “move method” refactoring of 
Fowler [5]. Moving predicate from one module to another can improve the overall struc- 
ture of the program by bringing together interdependent or related predicates. 

Rename Predicate. This is the counterpart of the “rename method” refactoring. It 
can improve readability and should be applied when the name of a predicate does not 
reveal its purpose. Renaming a predicate requires updating the calls to it as well as the 
interface between the defining and importing modules. 
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Reorder Arguments. Our experience suggests that while writing predicate definitions 
Prolog programmers tend to begin with the input arguments and to end with the output 
arguments. This methodology has been identified as a good practice and even further 
refined by O’Keefe [12] to more elaborate rules. Hence, to improve readability, argu- 
ment reordering is recommended: given the predicate name and the intended order of 
the arguments, the refactoring browser should produce the code such that the arguments 
of the predicate have been appropriately reordered. 

It should be noted that most Prolog systems use indexing on the first argument. 
Argument reordering can improve the efficiency of the program execution in this way. 

Another efficiency improvement is possible. Consider the fact f (a_out,b_in) . For 
the query ? - f (X, c_in) , first the variable X is bound to a_out and then the unification 
of c_in with b_in fails. It is more efficient to first unify the input argument and only 
if that succeeds bind the output argument. This is somewhat similar to produce output 
before commit in the next section. 

3.4 Clause Scope Refactorings 

The clause scope affects a single clause in a predicate. Usually, this does not affect any 
code outside the clause directly. 

Extract Predicate Locally. Similarly to the system-scope refactoring with the same 
name this technique replaces body subgoals with a call to a new predicate defined by 
these subgoals. Unlike for the system-scope here we do not aim to automatically dis- 
cover useful candidates for replacement or to replace similar sequences in the entire 
system. The user is responsible for selecting the subgoal that should be extracted. 

By restructuring a clause this refactoring technique can improve its readability. Suit- 
able candidates for this transformation are clauses with overly large bodies or clauses 
performing several distinct subtasks. By cutting the bodies of clauses down to size and 
isolating subtasks, it becomes easier for programmers to understand their meaning. 

Invert if-then-else. The idea behind this transformation is that while logically the order 
of the “then” and the “else” branches does not matter, it can be important for code 
readability. Indeed, an important readability criterion is to have an intuitive and simple 
condition. The semantics of the if-then-else construct in Prolog have been for years a 
source of controversy [1] until it was finally fixed in the ISO standard [2]. The main 
issue is that its semantics differ greatly from those of other programming languages. 
Restricting oneself to only conditions that do not bind variables but only perform tests 1 , 
makes it easier to understand the meaning of the if-then-else. 

To enhance readability it might be worth putting the shorter branch as “then” and 
the longer one as “else”. Alternatively, the negation of the condition may be more read- 
able, for example a double negation can be eliminated. This transformation might also 
disclose other transformations that simplify the code. 

Hence, we suggest a technique replacing (P -> Q ; R) with (\+ P -> R ; P, 
Q) . Of course, for a built-in P ViPReSS generates the appropriate negated built-in instead 

1 This is similar to the guideline in imperative languages not to use assignments or other side 
effects in conditions. 
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of \+ P. The call to P in the “else” branch is there to keep any bindings generated in 
P. If it can be inferred that P cannot generate any bindings, (e.g. because P is a built-in 
known not to generate any bindings) then P can be omitted from the “else” branch. 

Replace Cut by if-then-else. This technique aims at improving program readability 
by replacing cuts (!) by if-then-else (-> ; ). Despite the controversy on the use of 
cut inside the logic programming community, it is commonly used in practical appli- 
cations both for efficiency and for correctness reasons. We suggest a transformation 
that replaces some uses of cut by the more declarative and potentially more efficient 
if-then-else. 

Example 1. Figure 1 shows how this refactoring in ViPReSS transforms the program on 
the left to the program on the right. 
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Fig. 1 . Replace cut by if-then-else in ViPReSS. 



The right-hand side program shows that the refactoring preserves operational se- 
mantics. Moreover, assuming that N is the input and F the output of fac/2, the refactor- 
ing reveals hidden malpractices. These malpractices are discussed in more detail in the 
next two refactorings. 

Replace Unification by (In)equality Test. The previous refactoring may expose a 
hidden malpractice: full unifications are used instead of equality or other tests. 

O’Keefe in [ 12] advocates the importance of steadfast code: code that produces the 
right answers for all possible modes and inputs. A more moderate approach is to write 
code that works for the intended mode only. 

Unification succeeds in several modes and so does not convey a particular intended 
mode. Equality (==, = : =) and inequality (\==, =\=) checks usually only succeed for 
one particular mode and fail or raise an error for other modes. Hence their presence 
makes it easier in the code and at runtime to see the intended mode. Moreover, if only 
a comparison was intended, then full unification may lead to unwanted behaviour in 
unforeseen cases. 
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The two versions of fac/2 in Example 1 use unification to compare N to 0. This 
succeeds if N is variable by binding it, although this is not the intended mode of the 
predicate. By replacing N = 0 with N == 0 we indicate that N has to be instantiated 
to 0. This makes it easier for future maintenance to understand the intended mode of 
the predicate. A weaker check is N =:= 0 which allows N to be any expression that 
evaluates to 0. It may be worthwhile to consider a slightly bigger change of semantics: 
N =< 0 turns the predicate into a total function. Another way to avoid an infinite loop 
for negative input is to add N > 0 to the recursive clause. These checks capture the 
intended meaning better than the original unification. 

Note that equality tests are cheaper to execute in some Prolog systems, especially if 
they appear as the only goal in the condition of an if-then-else. Nevertheless, the main 
intent of this refactoring is to bring the operational semantics closer to the intended 
semantics of the programmer. If only a comparison is required, then full unification 
may lead to unwanted behaviour in unforeseen cases. 

Produce Output After Commit. Another malpractice that may be revealed by the 
replace cut by if-then-else refactoring, is producing output before the commit. This 
malpractice is disapproved of by O’Keefe in [12], in line with his advocacy for steadfast 
predicates. 

Now consider what happens with the predicate fac/2 in Example 1 if is called as 
? - f ac ( 0 , 0 ) . It does not fail. On the contrary, it backtracks into the second clause and 
goes into an infinite loop. On the other hand, the query ? - fac (0 , F) , F=0 does fail. 
Contrary to the intuition which holds for pure Prolog programs, it is not always valid to 
further instantiate a query than was intended by the programmer. 

By producing output after the commit, the second clause can no longer be con- 
sidered as an alternative for the first query. Hence, the following version of the first 
clause has better steadfastness properties: fac ( 0 , F) !, F = 1 . This refactoring 
may have an impact on the efficiency of the code. If the output is produced before a par- 
ticular clause or case is committed to and this fails, other cases may be tried, which in- 
curs an overhead. This is illustrated to the extreme with the non-terminating fac (0 , 0) 
query. 

4 TheViPReSS Refactoring Browser 

The refactoring techniques presented above have been implemented in the refactoring 
browser ViPReSS 2 . To facilitate acceptance of the tool ViPReSS by the developers com- 
munity it has been implemented on the basis of VIM, a popular clone of the well-known 
VI editor. Techniques like predicate duplication provided are easy to implement with 
the text editing facilities of VIM. 

Most of the refactoring tasks have been implemented as SICStus Prolog [7] pro- 
grams inspecting source files and/or call graphs. Updates to files have been implemented 
either directly in the scripting language of VIM or, in the case many files had to be 
updated at once, through ed scripts. VIM functions have been written to initiate the 
refactorings and to get user input. 

2 Vi(m) P(rolog) Re(factoring) (by) S(chrijvers) (and) S(erebrenik) 
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weather_in_us (City, TC) 

temperature(City, TF) , 
Temp is TF - 32, 

TC is Temp * 5 / 0 . 
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Fig. 2. Screenshot of ViPReSS in action: extract predicate locally. 



Figure 2 shows a screenshot of extract predicate locally in VIM. The user selects 
the subgoals that are to be extracted into a predicate and then invokes the refactoring by 
hitting the appropriate key. Then the user enters the desired predicate name. Finally, the 
file is filtered through a Prolog program that generates the new predicate and replaces 
the original goals by a call to it. 

ViPReSS has been successfully applied to a large (more than 53,000 lines) legacy 
system used at the Computer Science department of the Katholieke Universiteit Leu- 
ven to manage the educational activities. The system, called BTW. (Flemish for value- 
added tax) has been developed and extended since the early eighties by more than ten 
different programmers, many of whom are no longer employed by the department. The 
implementation has been done in the MasterProLog [8] system that, to the best of our 
knowledge, is no longer supported. 

By using the refactoring techniques we succeeded in obtaining a better understand- 
ing of this real-world system, in improving its structure and maintainability, and in 
preparing it for further intended changes such as porting it to a state-of-the-art Prolog 
system and adapting it to new educational tasks the department is facing as a part of the 
unified Bachelor-Master system in Europe. 

We started by removing some parts of the system that have been identified by the 
expert as obsolete, including out-of-fashion user interfaces and outdated versions of 
program files. The bulk of dead code was eliminated in this way, reducing the system 
size to a mere 20,000 lines. 

Next, we applied most of the system-scope refactorings described above. Even after 
removal of dead code by the experts ViPReSS identified and eliminated 299 dead pred- 
icates. This reduced the size by another 1,500 lines. Moreover ViPReSS discovered 79 
pairwise identical predicates. In most of the cases, identical predicates were moved to 
new modules used by the original ones. The previous steps allowed us to improve the 
overall structure of the program by reducing the number of files from 294 to 1 16 files 
with a total of 18,000 lines. Very little time was spent to bring the system into this state. 
The experts were sufficiently familiar with the system to immediately identify obsolete 
parts. The system-scope refactorings took only a few minutes each. 

The second step of refactoring consisted of a thorough code inspection aimed at 
local improvement. Many malpractices have been identified: excessive use of cut com- 
bined with producing the output before commit being the most notorious one, Addi- 
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tional “bad smells” discovered include bad predicate names such as q, unused argu- 
ments and unifications instead of identity checks or numerical equalities. Some of these 
were located by ViPReSS , others were recognised by the users, while ViPReSS per- 
formed the corresponding transformations. This step is more demanding of the user. 
She has to consider all potential candidates for refactoring separately and decide on 
what transformations apply. Hence, the lion’s share of the refactoring time is spent on 
these local changes. 

In summary, from the case study we learned that automatic support for refactoring 
techniques is essential and that ViPReSS is well-suited for this task. As the result of 
applying refactoring to BTW we obtained better-structured lumber-free code. Now it 
is not only more readable and understandable but it also simplifies implementing the 
intended changes. From our experience with refactoring this large legacy system and 
the relative time investments of the global and the local refactorings, we recommend to 
start out with the global ones and then selectively apply local refactorings as the need 
occurs. 

A version of ViPReSS to refactor SICStus programs can be downloaded from: 
http : / /www. cs .kuleuven. ac .be/" toms/vipress. The current version, 0.2.1, con- 
sists of 1,559 lines of code and can also refactor ISO Prolog programs. Dependencies 
on the system specific builtins and the module system have been separated as much as 
possible from the refactoring logic. This should make it fairly easy to refactor other 
Prolog variants as well. 



5 Conclusions and Future Work 

In this paper we have shown that the ideas of refactoring are applicable and important 
for logic programming. Refactoring helps bridging the gap between prototypes and real- 
world applications. Indeed, extending a prototype to provide additional functionality 
often leads to cumbersome code. Refactoring allows software developers both to clean 
up code after changes and to prepare code for future changes. 

We have presented a catalogue of refactorings, at different scopes of a containing 
both previously known refactorings for object-oriented languages now adapted for Pro- 
log and entirely new Prolog-specific refactorings. Although the presented refactorings 
do require human input as it is in the general spirit of refactoring, a large part of the 
work can be automated. Our refactoring browser ViPReSS integrates the automatable 
parts of the presented refactorings in the VIM editor. 

Logic programming languages and refactoring have already been put together at 
different levels. Tarau [20] has refactored the Prolog language itself. However, this ap- 
proach differs significantly from the traditional notion of refactoring as introduced by 
Fowler [6]. We follow the latter definition. Recent relevant work is [21] in the context 
of object oriented languages: a meta-logic very similar to Prolog is used to detect for 
instance obsolete parameters. 

None of these papers, however, considers applying refactoring techniques to logic 
programs. In our previous work [18] we have emphasised the importance of refactoring 
for logic programming and discussed the applicability of the refactoring techniques de- 
veloped for object-oriented languages to Prolog and CLP-programs. Seipel et al. [17] 
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include refactoring among the analysis and visualisation techniques that can be eas- 
ily implemented by means of FnQuery, a Prolog-inspired query language for XML. 
However, the discussion stays at the level of an example and no detailed study has been 
conducted. 

In the logic programming community questions related to refactoring have been in- 
tensively studied in context of program transformation and specialisation [3,4, 10, 14]. 
There are two important differences with this line of work. Firstly, refactoring does not 
aim at optimising performance but at improving readability, maintainability and exten- 
sibility. In the past these features where often sacrified to achieve efficiency. Secondly, 
user input is essential in the refactoring process while traditionally only automatic ap- 
proaches were considered. Moreover, usually program transformations are part of a 
compiler and hence, they are “invisible” to the program developer. However, some of 
the transformations developed for program optimisation, e.g. dead code elimination, 
can be considered as refactorings and should be implemented in refactoring browsers. 

To further increase the level of automation of particular refactorings additional in- 
formation such as types and modes can be used. To obtain this information the refactor- 
ing system could be extended with type and mode analyses. On the other hand, it seems 
worthwhile to consider the proposed refactorings in the context of languages with type 
and mode declarations like Mercury [19], especially as these languages claim to be 
of greater relevance for programming in the large than traditional Prolog. Moreover, 
dealing with higher order features is essential for refactoring in a real world context. 
The above mentioned languages with explicit declarations for such constructs would 
facilitate the implementation of an industrial strength refactoring environment. 



References 

1. The -> operator. Association for Logic Programming Newsletter, 4(2): 10 — 12, 1991. 

2. Information technology-Programming languages-Prolog-Part 1: General core. ISO/IEC, 
1995. ISO/IEC 13211-1:1995. 

3. Y. Deville. Logic Programming: Systematic program development. Addison- Wesley, 1990. 

4. S. Etalle, M. Gabbrielli, and M. C. Meo. Transformations of CCP programs. ACM Transac- 
tions on Programming Languages and Systems, 23(3):304-395, May 2001. 

5. M. Fowler. Refactorings in alphabetical order. 

Available at http: //www. ref actoring. com/catalog/, 2003. 

6. M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. Refactoring: improving the design 
of existing code. Object Technology Series. Addison- Wesley, 1999. 

7. Intelligent Systems Laboratory. SICStus Prolog User’s Manual. PO Box 1263, SE-164 29 
Kista, Sweden, October 2003. 

8. IT Masters. MasterProLog Programming Environment, www. itmasters.com, 2000. 

9. N. Jacobs and H. Blockeel. The learning shell : Automated macro construction. In User 
Modeling 2001, volume 2109 of LNAI. pages 34^-3. Springer Verlag, 2001. 

10. M. Leuschel and M. H. Sprensen. Redundant argument filtering of logic programs. In J. Gal- 
lagher. editor, Proceedings of the 6 th International Workshop on Logic Program Synthesis 
and Transformation, volume 1207 of LNCS, pages 83-103. Springer Verlag, 1996. 

11. H. Li, C. Reinke, and S. Thompson. Tool support for refactoring functional programs. In 
J. Jeuring, editor, Haskell Workshop 2003. Association for Computing Machinery, 2003. 

12. R. A. O’Keefe. The Craft of Prolog. MIT Press, Cambridge, MA, USA, 1994. 




72 



Tom Schrijvers and Alexander Serebrenik 



13. W. F. Opdyke. Refactoring object-oriented frameworks. PhD thesis, University of Illinois at 
Urbana-Champaign, 1992. 

14. A. Pettorossi and M. Proietti. Transformation of logic programs: Foundations and techniques. 
Journal of Logic Programming, 19/20:261-320, May/July 1994. 

15. D. Roberts, J. Brant, and R. Johnson. A refactoring tool for Smalltalk. Theory and Practice 
of ObjectSystems (TAPOS), 3(4):253-263, 1997. 

16. T. Schrijvers, A. Serebrenik, and B. Demoen. Refactoring Prolog programs. Technical Report 
CW373, Department of Computerscience, K.U.Leuven, December 2003. 

17. D. Seipel, M. Hopfner, and B. Heumesser. Analysing and visualizing Prolog programs based 
on XML representations. In F. Mesnard and A. Serebrenik, editors. Proceedings of the 13th 
International Workshop on Logic Programming Environments, pages 3 1 — 45, 2003. Published 
as technical report CW371 of Katholieke Universiteit Leuven. 

18. A. Serebrenik and B. Demoen. Refactoring logic programs. Poster. Ninetheenth International 
Conference on Logic Programming, Mumbay, India, December 9-13, 2003. 

19. Z. Somogyi. F. Henderson, and T. Conway. Mercury: an efficient purely declarative logic 
programming language. In Australian Computer Science Conference. 

20. P. Tarau. Fluents: A refactoring of Prolog for uniform reflection an interoperation with ex- 
ternal objects. In Computational Logic, First International Conference, London, UK, July 
2000, Proceedings, volume 1861 of LNAI, pages 1225-1239. Springer Verlag, 2000. 

21. T. Tourwe and T. Mens. Identifying refactoring opportunities using logic meta program- 
ming. In 7th European Conference on Software Maintenance and Reengineering, Proceed- 
ings, pages 91-100. IEEE Computer Society, 2003. 




Smodels with CLP and Its Applications: 
A Simple and Effective Approach 
to Aggregates in ASP 



Islam Elkabani, Enrico Pontelli, and Tran Cao Son 



Department of Computer Science 
New Mexico State University 

{ielkaban, epontell, tson}@cs .nmsu. edu 



Abstract. In this work we propose a semantically well-founded exten- 
sion of Answer Set Programming (ASP) with aggregates, which relies on 
the integration between answer set solvers and constraint logic program- 
ming systems. The resulting system is efficient, flexible, and supports 
form of aggregation more general than those previously proposed in the 
literature. The system is developed as an instance of a general framework 
for the embedding of arbitrary constraint theories within ASP. 



1 Introduction 

In recent years we witnessed the rapid development of alternative logical sys- 
tems, called non-monotonic logics [2], which allow new axioms to retract existing 
theorems; these logical systems are particularly adequate for common-sense rea- 
soning and modeling of dynamic and incomplete knowledge. In particular, in the 
last few years a novel programming paradigm based on non-monotonic logics, has 
arisen, called Answer Sets Programming (ASP) [12], which builds on the math- 
ematical foundations of logic programming ( LP ) and non-monotonic reasoning. 
ASP offers novel and declarative solutions in well-defined application areas, such 
as intelligent agents, planning, and diagnosis. 

Many practical systems have been recently proposed to support execution 
of Answer Set Programming (ASP) [15,6]. The logic-based languages provided 
by these systems offer a variety of syntactic structures, aimed at supporting 
the requirements arising from different application domains. Smodels and DLV 
have pioneered the introduction of language-level extensions, such as choice- 
literals, weight and cardinality constraints, weak constraints [15, 6] to facilitate 
the declarative development of applications. Nevertheless, there are simple prop- 
erties, commonly encountered in real-world applications, that cannot be con- 
veniently handled within the current framework of ASP - such as properties 
dealing with arithmetic and aggregation. In particular, aggregations and other 
forms of set constructions have been shown [5, 3, 13, 9] to be essential to reduce 
the complexity of software development and to improve the declarative level of 
the programming framework. In ASP, the lack of aggregation may lead to an 
exponential growth in the number of rules required for encoding a problem [1] . 
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The objective of this work is to address some of these aspects within the 
framework of Answer Set Programming. In particular, the main concrete objec- 
tive we propose to accomplish is to develop an extension of ASP which supports 
a semantically well-founded, flexible, and efficient implementation of aggregates. 
The model of aggregates we provide is more general than the form of aggregates 
currently present in the Smodels system [15] and those proposed in the A-Prolog 
system [7]. The DLV system has recently reported an excellent development to 
allow aggregates in ASP [3]; the DLV approach covers similar classes of aggre- 
gates as those described here, although our proposal follows a radically different 
methodology. The model we propose has the advantage of allowing developers to 
easily generalize to other classes of aggregates, to modify the strategies employed 
during evaluation, and even to accommodate for different semantics. A proposal 
for optimization of aggregate constraints have appeared in [14]. 

We follow a fairly general and flexible approach to address these issues. We 
start by offering a generic framework, called ASP-CLP, which provides a simple 
and elegant treatment of extensions of ASP w.r.t. generic constraint domains. 
We then instantiate this generic framework to the case of a constraint theory 
for aggregates. The resulting language, called ASP-CLP(Agg), is then imple- 
mented following the same strategy - i.e., by relying on the integration between 
a state-of-the-art ASP solver, specifically the Smodels system [15], and an exter- 
nal constraint solver. Instead of relying directly on an external constraint solver 
for aggregates, we make use of an external constraint solver for Finite Domain 
constraints (ECLiPSe [17]). The implementation is simple and elegant, and it 
supports easy modifications of aggregates and execution strategies. 

2 ASP with CLP: A General Perspective 

Let us consider a signature Sc = (Tc,V,IIc), where Tc is a set of constants 
and function symbols, V is a denumerable set of variables, and He is a collection 
of predicates. We will refer to Sc as the constraint signature and it will be 
used to build constraint formulae. A primitive constraint is an atom of the form 
p(t±, . . . , t n ), where p € lie and t\, . . . , t n are terms built from symbols of TcVV. 
A C -constraint is a conjunction of primitive constraints and their negation. 

Let us also assume a separate signature Sp = {Tp, V' , lip), where Tp is 
a collection of constants, V' is a denumerable collection of variables and Up 
is a collection of predicate symbols. We will refer to Sp as the ASP signature 
and we will denote with Up the Her brand universe built from the symbol of 
Sp and with Bp the Herbrand base. We will refer to an atom p(ti, . . . , t n ), 
where U £ Tp U V' and p £ lip, as an ASP-atom; an ASP-literal is either an 
ASP-atom or the negation ( not A) of an ASP-atom. An ASP-clause is a formula 
A Bi, . . . , Bk where A is an ASP-atom and B\, . . . , Bk are ASP-literals. An 
ASP-CLP clause is a formula of the form A C [ B \ , . . . , Bk where A is an 
ASP-atom, C is a (7-constraint, and B \ Bfc are ASP-literals. A program is 
a finite collection of ASP-CLP clauses. 

We assume that an interpretation structure Ac = ( A , (-) c ) for the constraint 
signature is given, where A is the domain and (-) c ' is a function mapping elements 
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of Tc {lie) to functions (relations) over A. Given a primitive constraint c, we 
will use the notation Ac |= c iff (c) c is true; the notion can be easily generalized 
to constraints. Let P be a ground ASP-CLP program and let M C Bp. We define 
the ASP-GLP-reduct of P w.r.t. M as the set of ground clauses 

( A:- C\Bx,..., B n , not D x ,...,not D m 

Pg = { A B u . . . ,B n | M Y= A(1 < i < rn), } 

{ Ac h C M ) 

where C M denotes the grounding of C w.r.t. the interpretation M. M is an 
ASP-CLP-stable model of P iff M is the least Her brand model of P^. 

3 ASP with CLP: Aggregates in ASP 

Our objective is to introduce different types of aggregates in ASP. Database 
query languages (e.g., SQL) use aggregate functions - such as sum , count, max, 
and min - to obtain summary information from a database. Aggregates have 
been shown to significantly improve the compactness and clarity of programs in 
various flavors of logic programming [11,5,3]- We expect to gain similar advan- 
tages from the introduction of different forms of aggregations in ASP. 

Example 1 ([13]). Let owns{X,Y,N) denote the fact that company X owns a 
fraction N of the shares of company Y . A company X controls a company Y if 
the sum of the shares it owns in Y together with the sum of the shares owned in 
Y by companies controlled by X is greater than half of the total shares of Y 1 . 



control(X , X, Y, N) owns(X,Y,N). 

control(X , Z , Y, N) control(X,Z), owns(Z,Y,N). 

fraction (X,Y,N) sum ({ { M: ( control (X,Z,Y,M) : company (Z))}}) = N. 

control(X , Y) fraction(X, Y,N), N >0.5. 



A significant body of research has been developed in the database and in the 
constraint programming communities exploring the theoretical foundations and, 
in a more limited fashion, the algorithmic properties of aggregation constructs in 
logic programming (e.g. [11, 16, 13,4]). More limited attention has been devoted 
to the more practical aspects related to computing in logic programming in 
presence of aggregates. In [1], it has been shown that aggregate functions can 
be encoded in ASP (e.g., example 1 above). The main disadvantage of this 
proposal is that the obtained encoding contains several intermediate variables, 
thus making the grounding phase quite expensive in term of space and time. 
Recently, a number of proposals to extend logic programming with aggregates 
have been developed, including work on the use of aggregates in ASET [7], 
work on sets and grouping in logic programming [5], and a recently proposed 
implementation of aggregates in the DLV system [3]. 

1 For the sake of simplicity we omitted the domain predicates required by Smodels. 
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The specific approach proposed in this work accomplishes the same objectives 
as [3,7]. The novelty of our approach lies in the technique adopted to support 
aggregates. We rely on the integration of different constraint solving technologies 
to support the management of different flavors of sets and aggregates. In this 
paper, we describe a back-end inference engine - obtained by integrating Smodels 
with a finite-domain constraint solver - capable of executing Smodels programs 
with aggregates. The back-end is meant to be used in conjunction with front- 
ends capable of performing high-level constraint handling of sets and aggregates 
(as in [5]). We will refer to the resulting system as ASP-CLP(Agg) hereafter. 

4 The Language 

Now we will give a formal definition of the syntax and semantics of the language 
accepted by the ASP-GLP(Agg) system. This language is an extension of the 
Smodels language, with the addition of aggregate functions. 

Syntax. The input language accepted by our system is analogous to the language 
of Smodels with the exception of a new class of literals - the aggregate literals. 

Definition 1. An extensional set (multiset) is a set (multiset) of the form 
{ai,...,afc} fDai, . . . , ak]t) where at are terms. An intentional set is a set of 
the form {A : Goal[X,Y]} ; an intentional multiset is a midtiset of the form 
{[A : Goal[X, A]]}. In both definitions, X is the grouping variable while Y are 
existentially quantified variables. Following the syntactic structure of Smodels, 
Goal[ A, F] is an expression of the form: p( X) (Y is empty), where p € lip, or 
p( X, Y) : q(Y), where p,q £ lip and q(Y) is the domain predicate for Y [15]. An 
intensional set (multiset) is ground if vars(Goal[ X, F]) = {A, Y}. An aggregate 
term is of the form aggr(S), where S is either an extensional or intensional set 
or midtiset, and aggr is a function. We will mostly focus on the handling of the 
“traditional” aggregate functions, i.e., count, sum, min, max, avg, times. 

With respect to the generic syntax of Section 2, in ASP-CLP(Agg) we assume 
that Tc contains a collection of function symbols of the type Fq^xy] an d 

^Goai[x y] ' The arity of each such function symbol corresponds to the number of 
free variables different from the grouping variable and the existentially quantified 
variables present in Goal[ A, Y] - i.e., 

arity = \vars(Goal[X,Y}) \ {X,Y}\ = arity 

Thus, F({ X : Goa^[A, F]}) and F(j{A : Goal[X,Y]J) are syntactic sug- 
ars for the terms: x (Zx, . . . , Z n ) and Fj® o ^ (Z lt . . . , Z n ), where 

{Z\, . . . , Z n } = vars(Goal[ A, F]) \ {A, F}. Similar definitions apply to the case 
of aggregate terms built using extensional sets. 

Definition 2. Aggregate literals are of the form aggr(S) op Result, where op is 
a relation from the set {=,^, <,>,<,>} and Result is a variable or a number. 
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The assumption is that the language of Section 2 is instantiated with lie = {=, 
7^) <,>,>, <}• Observe that the variables X, Y are locally quantihed within the 
aggregate. At this time, the aggregate literal cannot play the role of a domain 
predicate - thus other variables appearing in an aggregate literal (e.g., Result) 
are treated in the same way as variables appearing in a negative literal. 

Definition 3. An ASP-GLPfAggj rule is in the form: A L \, . . . , L n where 
A is a positive literal and L\, . . . , L n are standard literals or aggregate literals 2 . 
An ASP-CLPfAgg,) program is a set of ASP -CLP rules. 

In ASP-GLP(Agg), we have opted for relaxing the stratification requirements 
proposed in [7], which prevent the introduction of recursion through aggregates. 
The price to pay is the presence of non-minimal models [5,11]; on the other 
hand, the literature has highlighted situations where stratification of aggregates 
prevents natural solutions to problems [13,4]. 

Semantics. Now we will provide the stable model semantics [8] of the language, 
based on the interpretation of the aggregate atoms. The construction is simply 
an appropriate instantiation of the general definition provided in Section 2. Let 
us start with some terminology. Given a set A , we denote with A 4(A) the set of 
all finite multisets composed of elements of A, and let us denote with V(A) the 
set of all finite subsets of A. 

Given a ground intensional set (multiset) s term {X : Goal[X,Y} : 
Goal[X,Y ]][), and given an interpretation M C Bp, the grounding of s w.r.t. 
M (denoted by s M ) is the ground extensional set (multiset) term: {ai, . . . , a/,-} 
(Dai, . . . ,a fc |) where {(ai,6i), . . . , ( a k ,b k )} = {(a:, |/) | M |= Goal[x,y]}. 

As we have seen in Section 2, we assume the existence of a predefined inter- 
pretation structure Ac = (A, (• ) c ) for the constraint part of the language. In 
our case, the interpretation is meant to describe the meaning of the aggregate 
function and relations of ASP-GLP(Agg) . 

Let us start by defining the interpretation for the function symbols F a , 
where F is an aggregate operation (e.g., sum, count) and a is either {} or 
D}}- Intuitively, each aggregate function is interpreted as a function over sets 
or multisets of integers. In particular, we assume that standard aggregate func- 
tions are interpreted according to their usual meaning, e.g., (a is an element 
of { {}, DS }) ( sum a ) c is the function that maps a set (multiset) of inte- 
gers to their sum, ( count a ) c is a function mapping a set (multiset) of integers 
to its cardinality, etc. If s is the extensional set (multiset) term {ai, . . . ,a k } 
(•jjai,...,a fc |), then (Fp) c ((f/®) c ) is simply defined as the element of Z: 
(Fp)C = (Ftt) c ({a<f, ..., a c k }) and ( fP) c = (F») c ({«f , . . . , o£}). 

Given an interpretation M C Bp and given Goal[X,Y] such that 
vars(Goal[X,Y}) = {X, Y}, for each ^ oi r X yi G Tc and for each Fq^^xy] e 
F C we assume that ( F^ oal[x 9] ) c = (F^ } oal[X y ]M ) c and ( F^ al[x ^ ] ) c = 
( Fee,,,) \ x y]m) C - Similarly, we assume that the constraint structure interprets the 

We do not distinguish between the constraint and non-constraint part of each rule. 
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various relational operators employed in the construction of aggregate atoms - 
i.e. , =, <, >,>,< — according to their intuitive meaning as comparisons be- 

tween integer numbers. We can summarize the notion of satisfaction as follows: 

Definition 4. Given M C Bp and a ground aggregate F({X : Goal[X , 
y]}) opR, then A c \= (F({X : Goal[X,Y ]}) op R) M iff 

(F (} ) c ({«ez I 3b £ H*p.M \= Goal[a,b]}) op R 

is true w.r.t. Ac- Similarly, given a ground aggregate atom F(f[X : 
Goal[X,Y ]}}) opR, then A c \= (F({{A : Goal[X,Y ]}}) op R) M iff (F») c ({{a £ 
Z | 3b s ni M h Goal[a%}}) op R zs true Lr.t. Ac 

The remaining aspects of the notion of stable models are immediately derived 
from the definitions in Section 2, using the notion of entailment of constraints 
defined in Definition 4. Observe that in the case of aggregates, this semantics def- 
inition essentially coincides with the semantics proposed by a number of authors, 
e.g., [11,7,6]. Observe also that this semantics characterization has some draw- 
backs - in particular, there are situations where recursion through aggregates 
leads to non-minimal models [11,4], as shown in the following example. 

Example 2. Consider the following program: 

p{ 1). p{ 2). p(3). p(5) q. q sum({X : p(X)}) > 10. 

This program contains recursion through aggregates, and has two answer sets: 
A\ = |p(l),p(2),p(3)} and A 2 ={p(l),p(2),p(3),p(5),qj. A 2 is not minimal. 

5 Practical Integration of Aggregates in ASP 

The overall design of the proposed 
ASP-CLP(Agg) system is illustrated 
in Figure 1. The structure resembles 
the typical structure of most ASP 
solvers - i.e., a preprocessing phase, 
which is employed to simplify the 
input program and produce an ap- 
propriate ground version, is followed 
by the execution of an actual solver 
to determine the stable models of 
the program. The preprocessor is en- 
riched with a module used to de- 
termine dependencies between con- 
straints present in the input program 
and regular ASP atoms; in our case, the preprocessor detects the dependences 
between aggregates used in the program and the atoms that directly contribute 
to such aggregates. The result of the dependence analysis is passed to the solver 
(along with the ground program) to allow the creation of data structures to 
manage the constraints. 




Fig. 1 . Overall Design 
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The solver is a combination of a traditional ASP solver - in charge of han- 
dling the program rules and controlling the flow of execution - and a constraint 
solver; the ASP solver sends the constraints to be evaluated to the external 
solver, which in turn returns instantiations of the boolean variables representing 
the components of the constraint. Intuitively, the constraint solver is employed 
to determine under what conditions (in terms of truth values of standard ASP 
atoms) a certain constraint will be satisfied. The result of the constraint pro- 
cessing will be used by ASP to modify the structure of the stable model under 
construction. Thus, the constraint solver is a “helper” in the computation of 
the stable models; at any point in time, relations between standard atoms exist 
within the data structures of Smodels while numerical relations expressed by ag- 
gregates exist within the constraint store. In the specific case of ASP-CLP(Agg), 
the solver used to handle aggregate constraints is implemented using a solver 
over finite domains. 

5.1 Representing Aggregates as Finite Domain Constraints 

As described in Figure 1, each aggregate constraint in a ASP-CLP(Agg) program 
is managed through a finite domain constraint solver. This section discusses the 
encoding of aggregate constraints as finite domain constraints. 

First, each atom appearing in an aggregate is represented as a domain vari- 
able with domain 0..1; the whole aggregate is then expressed as a constraint 
involving such variables. The intuition behind this transformation is to take 
advantage of the powerful propagation capabilities of finite domain constraint 
solver to automatically test the satisfiability of an aggregate and prune alterna- 
tives from its solution search space. In this work we rely on the finite domain 
constraint solver provided by the ECLiPSe system [17]. Let us summarize the 
encoding of the most relevant forms of aggregates used in ASP-GLP(Agg): 

• Count Aggregate: An aggregate atom in the form count{^X : Goal 
[X, F])S) op Result is represented as the finite domain constraint: 

X[i\] + X [ 22 ] + . . • + X[i n \ con-op Result 

where the A[i]’s are finite domain constraint variables representing all the 
ground atoms of Goal[X,Y ], the rs are the indices of the ground atoms in 
the atom table and con.-op is the ECLiPSe operator corresponding to the 
relational operator op. E.g., given the atoms p(l),p(2),p(3), the aggregate 
count(^A : p(A)J[) < 3 will lead to the constraint 

A[l]::0..1, X[2]::0..1, A[3]::0..1, A[l]+A[2]+A[3] #< 3 

where A[l], A [2], A [3] are constraint variables corresponding to 
p(l),p(2),p(3) respectively. The handling of the corresponding aggregate in 
presence of sets instead of multisets is very similar and it relies on the pre- 
processor identifying in advance atoms that provide the same contribution, 
and combine them with a logical or statement. E.g., assume that the exten- 
sion of p contains the atoms p( 1 , a),p( 1 , 6),p(2, c) and we have an aggregate 
of the type count{{X : p(X, Y) : domain(Y )})# > 1 it will be encoded as 

[X[b], X[i 2 ], X[t 3 ]] :: 0..1, (X[A]#= 1 #\/X[* 2 ]#= 1)#4» Bl, (51 + X[i 3 })#> 1. 
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• Sum Aggregate: An aggregate atom in the form sum(\X : Goal[X , 
Y]}}) op Result is represented as a finite domain constraint in the form: 
X[i\] * Vi x + X[i 2 \ * Vi 2 + ... + X[i n \ * v i n con-op Result , where 
the X [i] ! s are finite domain variables representing all the ground atoms of 
Goal[X,Y ], the i’s are the indices of the ground atoms in the atom table, 
Vi s are the values of X satisfying the atom Goal[X,Y] and con.op is the 
ECLiPSe operator corresponding to the relational op. E.g., given the atoms 
p(l) , p(2) , p(3) , the aggregate sum{^A : p(A)]() < 3 will lead to the con- 
straint 



A[l]::0..1, A[2]::0..1, A[3]::0..1, A[l]*l+A[2]*2+A[3]*3 #< 3 

The handling of the aggregates based on intensional sets instead of multisets 
follows the same strategy highlighted in the case of the count aggregate. 

• Max Aggregate: An aggregate atom in the form max(^X : Goal[X, 
Y] ]}) op Result is represented as the finite domain constraint: 

maxlist{{ A[?'i] * v u , X[i 2 ] *Vi 2 ,.. X[i n ] * m n ]) conjjp Result 

where the A[i]’s are finite domain constraint variables representing all the 
ground atoms of Goal[X,Y ], the V s are the indices of the ground atoms in 
the atom table, Vi s are the constants instantiating the atom Goal[X , Y] and 
con-op is the ECLiPSe operator corresponding to the relational operator op. 
E.g., given the atoms p(l),p(2),p(3), the aggregate max(^A : p(A)§)<5 will 
lead to the constraint [X[l], A" [2], A[3]]::0..1, maxlist([X[l] * 1, A [2] * 2, A' [3] * 
3]) #< 5. Observe that in this case there is no difference in the encoding if 
the aggregate is defined on intensional sets instead of multisets. Observe also 
that the contributions of the various atoms will have to be shifted to ensure 
that no negative contributions are present. 

• Min Aggregate: It might seem that the representation of the min(^X : 
Goal[X, Y]]}-) op Result aggregate atom as a finite domain constraint is analo- 
gous to that of the max aggregate with the only difference of using minlist/1 
instead of maxlist/ 1. This is not absolutely true. We have noticed a problem 
that might evolve when we represent the min aggregate in the same way as 
we did with the max aggregate. The problem is that we might have one or 
more values of the X^s are set to 0, which are the A^’s that represent ground 
atoms having false truth values, this might lead to a wrong answer when we 
compute the minimum value in a list, since the result will be 0 all the time, 
although the real minimum value could be another value rather than 0 (the 
minimum value of the Vi s that correspond to the X[i]’s representing ground 
atoms having true truth values). E.g. , given the atoms p(3),p(4),p(5), if we 
already knew that p(3) and p( 4) are true, while p( 5) is false, in this case 
if we use the same representation as the max aggregate in representing the 
aggregate rnin{\A : p(A)§)<2 that will lead to the constraint 



[A[l], X[2], A[3]]::0..1, minlist([X[ 1] * 3, A [2] * 4, A [3] * 5]) #< 2 
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This representation is wrong, since in this case the result for this constraint 
will be true, since the result from applying minlist will be 0 and 0 is less 
than two, but the correct answer should be false, since the minimum of the 
values that correspond to ground atoms having true truth value is 3 which 
is not less than 2. In order to overcome this problem, we have suggested 
the following representation of the aggregate atom min({X : Goal(X)}) op 
Result as a finite domain constraint: 

Y[ii] #= (X[t 1 ]*l)+1, 

Y[in] #= {X[in]*n)+ 1, 

element {Y[ii\, [ M, v n,..., Vi n ], Z[i\]), 



element (Y [in], [ M, v in ], Z[i n ]), 

minlist{[ Z[ii\, Z[i 2 ], Z[i n \ ]) comop Result 

where M is a constant such that M > Vi, for all possible values of i, 
the Yi s are selector indices that are used to select a value from the list 
[ M, Vi lf . , . , Vi n ] to be assigned to the Z[i\ s by using the fd-library con- 
straint element /3 and the .ZjiJ’s are the new list of X[i\*Vi with the exception 
that each X[i\ * 1 \ that corresponds to an atom with a false truth value is 
changed to M . E.g. , by applying this to the previous example, we will find 
that Z[ 1] is assigned 3, Z[ 2] is assigned 4 and Z\ 3] is assigned a large num- 
ber. In this case the result of the constraint minlist([Z[l], Z[2], Z[3]]) #< 2 
is false, which is a correct answer (since 3 < 2 is false). 

5.2 Implementation 

The implementation of ASP-CLP(Agg) has been realized by introducing localized 
modifications in the Smodels (V. 2.27) system [15] and by using the ECLiPSe 
system (V. 5.4) as a solver for finite domain constraints. In particular, the imple- 
mentation makes use of both Smodels - i.e. , the actual answer set solver - and 
Iparse - the front-end used by Smodels to intelligently ground the input program. 
In this section we provide details regarding the structure of the implementation. 
Preprocessing. The Preprocessing module is composed of three sequential 
steps. In the first step, a program - called Pre- Analyzer - is used to perform a 
number of simple syntactic transformations of the input program. The transfor- 
mations are mostly aimed at rewriting the aggregate literals in a format accept- 
able by Iparse. The second step executes the Iparse program on the output of the 
pre-analyzer, producing a ground version of the program in the Smodels format 
- i.e., with a numerical encoding of rules and with an explicit atom table. The 
third step is performed by the Post- Analyzer, whose major activities are: 

• Identification of the dependencies between aggregate literals and atoms con- 
tributing to such aggregates; these dependencies are explicitly represented. 

• Generation of the constraints encoding the aggregate; e.g., an entry like 

“57 sum(x,use(8,x), 3, multiset, greater)” in the atom table (describing the aggre- 
gate swn(^X : use(8, AT)])) > 3) is converted to 
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“57 sum(3,[16,32,48], “X16 * 2 + X32 * 1 + X48 * 4 + 0 #> 3”)” (16, 32, 48 are 
indices of use(8,_)). 

Data Structures. Now we will describe in more details the modifications done 
to the Smodels system data structures, in order to extend it with aggregate func- 
tions and make it capable of communicating the ECLiPSe constraint solver. As 
in Smodels, each atom in the program has a separate internal representation - 
including aggregate literals. In particular, each aggregate literal representation 
maintains information regarding what program rules it appears in. The repre- 
sentation of each aggregate literal is similar to that of a standard atom, with the 
exception of some additional fields; these are used to store an ECLiPSe struc- 
ture representing the constraint associated to the aggregate. Each standard atom 
includes a list of pointers to all the aggregate literals depending on such atom. 
Atom: Most of the new data structures that have been added in the new ASP- 
CLP(Agg) system are extensions of the class Atom - used by Smodels to rep- 
resent one atom. This is because we are introducing a new type of atoms (ag- 
gregate literals) which has its own properties. To represent these properties we 
have augmented the class Atom with the following fields: 

• Atom ** dependents: If this atom is an aggregate constraint, dependents is 
the list of atoms this aggregate depends on. 

• Atom ** constraints stores the aggregate literals depending on this atom. 

• int met_dependents: If this atom is an aggregate constraint, met-dependents 
is the number of its dependent atoms that still have unknown truth value. 

• EC-Word PosCon (NegCon) is an ECLiPSe data structure that holds the 
positive (negative) constraint to be posted in the constraint store (certina 
aggregates require different constraints to assert true and false status). 

• ECjref hook: It is one domain variable, representing a reified version of the 
constraint associated to the current aggregate atom. 

Finite Domain Variables: The communication between the Smodels system and 
the ECLiPSe is a two-way communication. The Smodels system is capable of 
posting constraints into the ECLiPSe constraint solver. On the other hand, 
ECLiPSe is communicating with Smodels by either sending the truth value of 
a posted completed aggregate constraint or by sending back values of labeled 
variables appearing in a constraint corresponding to a non-completed aggregate. 
These types of communication require Smodels to be able to directly access 
values of finite domain variables present in the constraint store managed by 
ECLiPSe. This can be done using the ECLiPSe data types EC-refs / EC-ref. 
We have added the following data structures in order to handle this situation: 

• EC_refs * X is an ECLiPSe structure that holds n references to ECLiPSe 
variables, where n is the number of atoms in the ground program. Thus, each 
atom is represented in the constraint store by a separate domain variable - 
these variables are declared as domain variables with domain 0...1. The 
discovery of the truth value of an atom in Smodels can be communicated 
to the constraint store by binding the corresponding variable; constraints in 
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the store can force variables to a selected value, which will be retrieved by 
Smodels and transformed in truth value assignment. 

Execution Control. The main flow of execution is directed by Smodels. In 
parallel with the construction of the model, our system builds a constraint store 
within ECLiPSe. The constraint store maintains one conjunction of constraints, 
representing the level of aggregate instantiation achieved so far. The implemen- 
tation of ASP-GLP(Agg) required localized changes to various modules of Smod- 
els. During our description for the control of execution, we are going to highlight 
some of the main changes that have been applied to the Smodels modules. 
Main: during the initialization of the data structures, an additional step is per- 
formed by ASP-GLP(Agg) related to the management of the aggregates. A col- 
lection of declarations and constraints are immediately posted to the ECLiPSe 
constraint store; these include: 

• If i is the internal index of one atom in Smodels , then a domain variable X [z] 
is created and the declaration X[i] :: 0..1 posted; 

• If an aggregate is present in the program and the preprocessor has created 
the constraint c for such aggregate, then Bi :: 0..1 , c# <=> Bi is posted in 
the store. The variable Bi is stored in the Atom structure for the aggregate. 

These two steps are illustrated in the post operations (1) and (2) in Figure 2. 

Expand: The goal of the Smodels expand module is to deterministically extend 
the set of atoms whose truth values are known. In our ASP-GLP(Agg) system we 
extend the expand module in such a way that, each time an aggregate dependent 
atom is made true or false, a new constraint is posted in the constraint store. If 
z is the index of such atom within Smodels , and the atom is made true (false), 
then the constraint X[i]#=l (X[z]#=0) is posted in the ECLiPSe constraint 
store. (Fig. 2, post operations (3) and (4)). If the ECLiPSe returns EC_fail 
this means that a conflict is detected (inconsistency), so the control returns to 
Smodels where the conflict is handled. Otherwise, ECLiPSe returns ECLsucceed 
and the control returns to the expand module. 

Since aggregate literals are treated by Smodels as standard program atoms, 
they can be made true, false, or guessed. The only difference is that, whenever 
their truth value is decided, a different type of constraint will be posted to the 
store - i.e., the constraint representation of the aggregate. For each aggregate, 
its constraint representation is reified and posted during the initialization. If 
the aggregate is determined to be true (false), then we simply need to post a 
constraint of the type Bi# = 1 (Bi#= 0), where Bi is the variable reifying 
the constraint for the aggregate (Fig. 2, post operation (5)). Observe that the 
constraints posted to the store have an active role during the execution: 

• Constraints can provide feedback to Smodels by forcing a truth value for 
previously uncovered atoms. This means that ECLiPSe can return an answer, 
in terms of instantiation of previously unbound variables, to Smodels. This 
instantiation is converted into a truth value for atoms in the Smodels and 
then the control returns to the expand module again. E.g., if the constraint 
(A[12]*2 + A[13]*4#<5)# <=> B is posted to the store during initialization 
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(corresponding to the aggregate sum(^X : p(X)]))<4) and X[\2]ff= 1 has 
been previously posted (i.e., p(2) is true), then requiring the aggregate to 
be true (by posting B# = 1) will force A[3 ]#= 0, i.e., p( 3) to be false 
(Fig. 2, post operation (5)). If there are more answers for the aggregate 
constraint, the control must return back to ECLiPSe for backtracking and 
generating another answer; this happens after Smodels computes the stable 
model containing the previous answer or fails to get a stable model containing 
the previous answer and backtracks. 

• Inconsistencies in the constraint store have to be propagated to Smodels. 
E.g., if we have (A - [12] + A[13] + X[14]# > 2 )# <=> B1 (corresponding 
to countf^X : p(X)][) > 2) and A[13]#= 0 (corresponding to p( 2) being 
false), and we finally request the aggregate to be true (posting Blff= 1), then 
ECLiPSe will return a failure, that will activate backtracking in Smodels. 

Check aggregate completion: An aggregate literal may become true/false not 
only as the result of the deductive closure computation of Smodels expand 
procedure, but also because enough evidence has been accumulated to prove 
its status. In this case, every time an aggregate dependent atom is made true 
or false, the aggregate literal it appears in should be checked for truth/falsity. 
The test can be simply performed by verifying the value of the variable B, 
attached to the reification of the aggregate constraint. If the value of B, is 1 
(0), then the aggregate can be immediately evaluated to true (false), regardless 
of the still unknown truth values of the rest of its dependent atoms E.g., if the 
constraint (A[16]*l + A[17]*2 + X[14] *3#>2)#<=> B2 is posted to the store 
(corresponding to the aggregate sum(^X : g(X)J)>2) and X[14]# = 1 (i.e., 
q( 3) is true), then in this case ECLiPSe instantiates B 2 to 1, which should be 
translated to a true value for the atom representing the aggregate in Smodels 
(while g(l) and q(2) are still unknown) (Fig. 2, check operation (7)). 

Pick: The structure of the computation developed by Smodels is reflected in the 
structure of the constraints store (see Fig. 2). In particular, each time Smodels 
generates a choice point (e.g., as effect of guessing the truth value of an atom), 
a corresponding choice point has to be generated in the constraint store (see 
Fig. 2, post operation (6)). Similarly, whenever Smodels detects a conflict and 
initiates backtracking, a failure has to be triggered in the store as well. Observe 
that choice points and failures can be easily generated in the constraint store 
using the repeat and fail predicates of ECLiPSe. In our ASP-CLP(Agg) system, 
we have extended the Smodels pick module to allow aggregate atoms to be picked 
and its truth value is guessed in the same manner as in the case of non-aggregate 
atoms. Obviously, aggregate atoms that are picked are non-completed aggregate 
atoms since, as we mentioned previously, aggregate atoms are checked for their 
completion every time a dependent atom is made true or false. In this case, the 
picked aggregate atom is set to true (by simply posting the constraint -B#= 1, 
where B is the variable associated to the reification of the aggregate). 

As mentioned, a choice point is generated (using the repeat predicate) into 
the ECLiPSe constraint store before posting the picked aggregate. If a conflict 
is detected, it is propagated to the ECLiPSe constraint store (by posting the 
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Fig. 2. Communication between Smodels and ECLiPSe 



fail constraint to the constraint store) where a failure is generated to force 
backtracking to the choice point. Backtracking to a choice point will require 
posting the complementary constraint to the constraint store - e.g., if originally 
the constraint generated was X[i]#= 1 (B#= 1) then upon backtracking the 
constraint X[i]#= 0 (B#= 0) will be posted (see the post operation (6) in Fig. 
2). If no conflicts were detected, then the Smodels will continue the computation 
of the model and a backtracking will take place for constructing a new model. At 
this point the control will return to ECLiPSe where a new answer is generated. 

ASP-CLP(Agg) supports two modalities for picking aggregate atoms. Under 
the lazy modality, the truth value of an aggregate atom is guessed by simply 
instantiating the variable associated to the corresponding reification. E.g., if we 
want to guess the truth value of the aggregate count(^X : p(X)J) < 2, which 
was initially reified as (X[3] + X[5] + A[6]# < 2 )# <=> B , then the pick 
operation will simply generate a choice point and post the constraint B#= 1 
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(and B^= 0 during backtracking). Note that this may not immediately provide 
enough information to the constraint solver to produce values for the variables 
X[3], X[5], X[6]. Under the eager modality, we expect the constraint solver to 
immediately start enumerating the possible variable instantiations satisfying the 
constraint; thus when the aggregate is picked, we also request the constraint 
store to label the variables in the constraint (Fig. 2, post operation (8)). In the 
previous example, when the aggregate is picked we not only request its constraint 
to be true (by posting I3#= 1) but we also post a labeling ([X [3], X[5\, X[6]\). 
Evaluation. The implementation of the resulting system has been completed, 
and it is available at www . cs . nmsu . edu/“ielkaban/ smodels-ag . html. The cur- 
rent prototype, built using Smodels and ECLiPSe is stable and it was used to 
successfully implement a number of benchmark programs. The execution speed is 
good, thanks to the good implementation of the ECLiPSe interface (which limits 
the communication overhead between the two systems). Furthermore, the sys- 
tem has demonstrated excellent ability to reduce the search space for programs 
that contain a number of aggregates related to the same predicates - their rep- 
resentations as constraints and the propagation mechanisms of ECLiPSe allows 
to automatically prune a number of irrelevant alternatives. Work is in progress 
to optimize the implementation and to perform formal comparisons with other 
relevant systems (e.g., Smodels using cardinality constraints and DLV). 

6 Example 

The problem [13, 10] is to send out party invitations, considering that some 
people will accept the invitation only if at least k of their friends accept it 
too. In the program below, Mary will accept the invitation iff Sue does as well. 
According to the semantic of our language we are expecting two situations. In 
the first, we assume that there is a bad communication between Mary and Sue 
and in this case a deadlock situation will occur and neither of them will accept 
the invitation. The other situation is that both of them simultaneously accept. 
The relation requires (X, K) is true when an invited person X requires at least 
K of her friends to accept. 

requires(ann,0) . requires (rose, 0) . requires (mary, 1) . requires (sue , 1) . 
friend(mary , sue) . f riend(sue ,mary) . 

coming(X) requires (X , 0) . 

coming(X) requires(X.K) , count({{ Y: kc(X,Y)}}) >= K. 
kc(X,Y) friend(X,Y), coming (Y) . 

Two models are returned, one containing coming (ann) , coming(rose) and one 
containing coming (mary) , coming(sue) , coming(ann), coming(rose) . 

7 Discussion 

Various proposals have been put forward to provide alternative semantics for 
logic programming with aggregates [4,13]. A natural alternative semantics, 
which removes the presence of non-minimal models, can be defined as follows. 




Smodels with CLP and Its Applications 



87 



Definition 5. Let us consider a ground aggregate literal a of the form F\X : 
Goal[X,Y ]J op Result. Let us denote with 5(ct) the following set: 



5(a) 



{(fli) bi), 



, ts-,, ai,bi are ground terms , ) 

’ ' Ac \= (F{{ai , ... ,a n }} op Result ) J 



We will refer to 5(a) as the Aggregate Solution Set. 

Definition 6 (Aggregate Unfolding). Let a be the ground aggregate F-{[X : 
Goal[X,Y]]j op Result. We define the unfolding of a (unfold (a),) as the set 



unfold(a)=l Goal[a,b\ A notGoal[a,b\ | S 6 5(a) 

^ (a,6)£5 (a, 6)05 

For a non-aggregate literal A, we have unfold(A) = {A}. The unfolding of a 
clause H Bi, ... , B n is defined as the set of clauses: 
unfold(H:-B 1 ,...,B n )={(H:~p 1 ,...,p n ) \ fy e unfold(Bi),l < i < n} 
The unfolding of a program P (unfold (P)) is obtained by unfolding each clause. 

Definition 7 (Alternative Stable Model Semantics for Aggregates). Let 

M be an Herbrand interpretation and let P be a program with aggregates. M is 
a stable model of P iff M is a stable model of unfold(P). 

Example 3. Consider the program of Example 2. The unfold of this program 
yields a program which is identical except for the last rule: q p ( 1 ) , p(2) , 
p(3) , p (5) . since {[1, 2,3,5]) is the only multiset that satisfies the aggregate. 
The resulting program has a single answer set: |p(l),p(2),p(3)}, thus the non- 
minimal model accepted in the former semantic characterization (see Example 
2) is no longer a stable model of the program. 

This alternative semantics can be supported with minimal changes in the pro- 
posed system. The construction and handling of the constraints encoding aggre- 
gate computations is unchanged. The only changes are in the management of the 
declarative closure computation in presence of aggregates within Smodels. The 
presence of non-minimal models derives from true aggregates being treated as 
facts, loosing the dependencies between the aggregate and the atoms it depends 
on. These dependencies can be restored by dynamically introducing a rule upon 
satisfaction of an aggregate - where the body of the rules includes the true atoms 
satisfying the aggregates (readily available from the preprocessor). 

8 Conclusions and Future Work 

A prototype implementing these ideas has been completed and used on a pool of 
benchmarks. Performance is acceptable, but we expect significant improvements 
by refining the interface with ECLiPSe. Combining a constraint solver with 
Smodels brings many advantages: 
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• since we are relying on an external constraints solver to effectively handle the 
aggregates, the only step required to add new aggregates (e.g., times, avg ) is 
the generation of the appropriate constraint formula during preprocessing; 

• the constraint solvers are very flexible; e.g., using CHRs we can implement 
different strategies to handle constraints and new constraint operators; 

• it is a straightforward extension to allow the user to declare aggregate in- 
stances as eager ; in this case, instead of posting only the corresponding con- 
straint to the store, we will also post a labeling, forcing the immediate reso- 
lution of the constraint store (i.e., guess the possible combinations of truth 
values of selected atoms involved in the aggregate). In this way, the aggregate 
will act as a generator of solutions instead of just a pruning mechanism. 

We believe this approach has advantages over previous proposals. The use 
of a general constraint solver allows us to easily understand and customize the 
way aggregates are handled (e.g., allow the user to select eager vs. non-eager 
treatment); it also allows us to easily extend the system to include new form of 
aggregates, by simply adding new type of constraints. Furthermore, the current 
approach relaxes some of the syntactic restriction imposed in other proposals 
(e.g., stratification of aggregations). The implementation requires minimal mod- 
ifications to Smodels and introduces insignificant overheads for regular programs. 
The prototype confirmed the feasibility of this approach. 

In our future work, we propose to relax some of the syntactic restrictions 
- e.g., the use of labeling allows the aggregates to “force” solutions, so that 
the aggregate can act as a generator of values; this may remove the need for 
domain predicates for the result of the aggregate (e.g., the safety condition used 
in DLV). 
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Abstract. Constraint Handling Rules (CHRs) are a high-level rule- 
based programming language commonly used to write constraint solvers. 
The theoretical operational semantics for CHRs is highly non-deter- 
ministic and relies on writing confluent programs to have a meaningful 
behaviour. Implementations of CHRs use an operational semantics which 
is considerably finer than the theoretical operational semantics, but is 
still non-deterministic (from the user’s perspective). This paper formally 
defines this refined operational semantics and proves it implements the 
theoretical operational semantics. It also shows how to create a (partial) 
confluence checker capable of detecting programs which are confluent un- 
der this semantics, but not under the theoretical operational semantics. 
This supports the use of new idioms in CHR programs. 



1 Introduction 

Constraint Handling Rules (CHRs) are a high-level rule-based programming lan- 
guage commonly used to write constraint solvers. The theoretical operational 
semantics of CHRs is relatively high level with several choices, such as the or- 
der in which transitions are applied, left open. Therefore, only confluent CHR 
programs, where every possible execution results in the same result, have a guar- 
anteed behaviour. 

This paper looks at the refined operational semantics, a more specific op- 
erational semantics which has been implicitly described in [10,11], and is used 
by every Prolog implementation of CHRs we know of. Although some choices 
are still left open in the refined operational semantics, both the order in which 
transitions are applied and the order in which occurrences are visited, is decided. 
Unsurprisingly, the decisions follow Prolog style and maximise efficiency of ex- 
ecution. The remaining choices, which matching partner constraints are tried 
first, and the order of evaluation of CHR constraints awoken by changes in vari- 
ables they involve, are left as choices for two reasons. First it is very difficult to 
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see how a CHR. programmer will be able to understand a fixed strategy in these 
cases. And second implementing a fixed strategy will restrict the implementation 
to be less efficient, for example by disallowing hashing index structures. 

It is clear that CHR programmers take the refined operational semantics into 
account when programming. For example, some of the standard CHR examples 
are non-terminating under the theoretical operational semantics. 

Example 1. Consider the following simple program that calculates the greatest 
common divisor ( gcd ) between two integers using Euclid’s algorithm: 

gcdl @ gcd(O) <=> true. 

gcd2 0 gcd(N) \ gcd(M) <=> M >= N I gcd(M-N) . 

Rule gcdl is a simplification rule. It states that a fact gcd(O) in the store can 
be replaced by true. Rule gcd2 is a simpagation rule, it states that if there are 
two facts in the store gcd(n) and gcd(m) where m > n, we can replace the part 
after the slash gcd(?n) by the right hand side gcd (to — n) 1 . The idea of this 
program is to reduce an initial store of gcd (A) , gcd(B) to a single constraint 
gcd(C) where C will be the gcd of A and B. 

This program, which appears on the CHR webpage [6] , is non-terminating un- 
der the theoretical operational semantics. Consider the constraint store gcd (3) , 
gcd (0) . If the first rule fires, we are left with gcd (3) and the program terminates. 
If, instead, the second rule fires (which is perfectly possible in the theoretical 
semantics), gcd (3) will be replaced with gcd (3-0) = gcd (3), thus essentially 
leaving the constraint store unchanged. If the second rule is applied indefinitely 
(assuming unfair rule application), we obtain an infinite loop. 

In the above example, trivial non-termination can be avoided by using a fair 
rule application (i.e. one in which every rule that could fire, eventually does). 
Indeed, the theoretical operational semantics given in [7] explicitly states that 
rule application should be fair. Interestingly, although the refined operational 
semantics is not fair (it uses rule ordering to determine rule application), its 
unfairness ensures termination in the gcd example above. Of course, it could 
also have worked against it, since swapping the order of the rules would lead to 
non-t ermin at ion . 

The refined operational semantics allows us to use more programming idioms, 
since we can now treat the constraint store as a queryable data structure. 

Example 2. Consider a CHR implementation of a simple database: 

11 @ entry (Key, Val) \ lookup(Key, ValOut) <=> ValOut = Val. 

12 @ lookup(_,_) <=> fail. 

where the constraint lookup represents the basic database operations of key 
lookup, and entry represents a piece of data currently in the database (an entry 
in the database). Rule 11 looks for the matching entry to a lookup query and 
returns in ValOut the stored value. Rule 12 causes a lookup to fail if there is no 
matching entry. Clearly the rules are non-confluent in the theoretical operational 
semantics, since they rely on rule ordering to give the intended behaviour. 

Unlike Prolog, we assume the expression “m — n” is automatically evaluated. 
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The refined operational semantics also allows us to create more efficient pro- 
grams and/or have a better idea regarding their time complexity. 

Example 3. Consider the following implementation of Fibonacci numbers, 
fib(N,F), which holds if F is the N th Fibonacci number: 

fl @ f ib(N,F) <=> 1 >= N | F = 1. 
f 2 @ f ib(N,F0) \ f ib(N,F) <=> N >= 2 I F = FO. 

f 3 0 f ib(N,F) ==> N >= 2 | fib(N-2, Fl) , fib(N-l,F2), F = Fl + F2. 

Rule f3 is a propagation rule (as indicated by the ==> arrow), which is similar 
to a simplification rule except the matching constraint fib (n,f) is not removed 
from the store. 

The program is confluent in the theoretical operational semantics which, as 
we will see later, means it is also confluent in the refined operational semantics. 
Under the refined operational semantics it has linear complexity, while swapping 
rules f 2 and f3 leads to exponential complexity. Since in the theoretical opera- 
tional semantics both versions are equivalent, complexity is at best exponential. 

We believe that Constraint Handling Rules under the refined operational se- 
mantics provide a powerful and elegant language suitable for general purpose 
computing. However, to make use of this language, authors need support to 
ensure their code is confluent within this context. In order to do this, we first 
provide a formal definition of the refined operational semantics of CHRs as im- 
plemented in logic programming systems. We then provide theoretical results 
linking the refined and theoretical operational semantics. Essentially, these re- 
sults ensure that if a program is confluent under the theoretical semantics, it is 
also confluent under the refined semantics. Then, we provide a practical (partial) 
confluence test capable of detecting CHR programs which are confluent for the 
refined operational semantics, even though they are not confluent for the theo- 
retical operational semantics. Finally, we study two CHR programs and argue 
our test is sufficient for real world CHR programs. 

2 The Theoretical Operational Semantics u: t 

We begin by defining constraints, rules and CHR programs. For our purposes, 
a constraint is simply defined as an atom p(ti, ... ,t n ) where p is some predicate 
symbol of arity n > 0 and (ti,...,t n ) is an n-tuple of terms. A term is defined 
as either a variable A, or as /(h, ...,f„) where / is a function symbol of arity 
n and ti,...,t n are terms. Let vars(A) return the variables occurring in any 
syntactic object A. We use 3aF to denote the formula 3Ai ■ ■ ■ 3 X n F where 
{Ai, . . . A„} = vars(F) — vars(A). We use s = t , where s and i are sequences, 
to denote the conjunction Si = ti A • • • A s n = t n . 

Constraints can be divided into either CHR constraints or builtin constraints 
in some constraint domain T). While the former are manipulated by the CHR 
execution algorithm, the latter are handled by an underlying constraint solver. 
Decisions about rule matchings will rely on the underlying solver proving that the 
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current constraint store for the underlying solver entails a guard (a conjunction 
of builtin constraints) . We will assume the solver supports (at least) equality. 

There are three types of rules: simplification, propagation and simpagation. 
For simplicity, we consider both simplification and propagation rules as special 
cases of a simpagation rules. The general form of a simpagation rule is: 

r@H 1 \H 2 g \ B 

where r is the rule name, Hi and H 2 are sequences of CHR. constraints, g is 
a sequence of builtin constraints, and B is a sequence of constraints. If H\ is 
empty, then the rule is a simplification rule. If H 2 is empty, then the rule is a 
propagation rule. At least one of Hi and H 2 must be non-empty. Finally, a CHR 
program P is a sequence of rules. 

We use [H\T\ to denote the first ( H ) and remaining elements (T) of a se- 
quence, -H- for sequence concatenation, e for empty sequences, and l±l for multiset 
union. We shall sometimes treat multisets as sequences, in which case we non- 
deterministically choose an order for the objects in the multiset. 

Given a CHR program P, we will be interested in numbering the occurrences 
of each CHR constraint predicate p appearing in the head of the rule. We number 
the occurrences following the top-down rule order and right-to-left constraint 
order. The latter is aimed at ordering first the constraints after the backslash 
(\) and then those before it, since this gives the refined operational semantics a 
clearer behaviour. 

Example f. The following shows the gcd CHR program of Example 1, written 
using simpagation rules and with all its occurrences numbered: 

gcdl @ e \ gcd(0)i <=> true I true. 

gcd2 0 gcd(N) 3 \ gcd(M) 2 <=> M > N I gcd(M-N) . 

2.1 The u>t Semantics 

Several versions of the theoretical operational semantics have already appeared 
in the literature, e.g. [1,7], essentially as a multiset rewriting semantics. This 
section presents our variation, which is equivalent to previous ones, but is close 
enough to our refined operational semantics to make proofs simple. 

Firstly, we define an execution state , as the tuple ( G, S, B,T) n where each 
element is as follows. The goal G is the multiset (repeats are allowed) of con- 
straints to be executed. The CHR constraint store S is the multiset of identified 
CHR constraints that can be matched with rules in the program P. An identified 
CHR constraint cffi is a CHR constraint c associated with some unique integer 
i. This number serves to differentiate among copies of the same constraint. We 
introduce functions chr(cffi ) = c and id{cffi) = i, and extend them to sequences 
and sets of identified CHR constraints in the obvious manner. 

The builtin constraint store B contains any builtin constraint that has been 
passed to the underlying solver. Since we will usually have no information about 
the internal representation of P, we will model it as an abstract logical con- 
junction of constraints. The propagation history T is a set of sequences, each 
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recording the identities of the CHR constraints which fired a rule, and the name 
of the rule itself. This is necessary to prevent trivial non-termination for propa- 
gation rules: a propagation rule is allowed to fire on a set of constraints only if 
the constraints have not been used to fire the rule before. Finally, the counter n 
represents the next free integer which can be used to number a CHR constraint. 

Given an initial goal G, the initial state is: (G, 0, true, 0)i. The theoretical 
operational semantics co t is based on the following three transitions which map 
execution states to execution states: 



1. Solve ({c} l±l G, S, B , T) n >— ► (G, S,cAB, T) n where c is a builtin constraint. 

2. Introduce ({c}l ±>G, S, B,T) n >— > (G, {c#n}^£ S, B.T)^^ where c is a CHR 
constraint. 

3. Apply (G, Hi l±l H 2 W S, B , T) n >— ► (G l±l G, H\ l±l S, 9 A B, T') n where there 
exists a (renamed apart) rule in P of the form 

r @ H[ \ H ' 4=4- g\C 



and a matching substitution 9 such that chr(Hi) = 9(H[), chr(H 2 ) = d(H 2 ) 
and V |= B — ■> 3 b(9 A g), and the tuple id{H\) -H- id(H 2 ) ++ [r] fL T. In the 
result T' = T U {id(Hi) ++ id(H 2 ) -H- [r]} 2 . □ 

The hrst rule tells the underlying solver to add a new builtin constraint to the 
builtin constraint store. The second adds a new identified CHR constraint to the 
CHR constraint store. The last one chooses a program rule for which matching 
constraints exist in the CHR constraint store, and whose guard is entailed by 
the underlying solver, and fires it. For readability, we usually apply the resulting 
substitution 9 to all relevant fields in the execution state, i.e. G, S and B. This 
does not affect the meaning of the execution state, or its transition applicability, 
but it helps remove the build-up of too many variables and constraints. 

The transitions are non-deterministically applied until either no more transi- 
tions are applicable (a successful derivation), or the underlying solver can prove 
V \= ~3®B (a failed derivation). In both cases a final state has been reached. 

Example 5. The following is a (terminating) derivation under u) t for the query 
gcd(6) , gcd(9) executed on the gcd program in Example 4. For brevity, B and 
T have been removed from each tuple. 

({gcd(6),gcd(9)},0)i (1) 

* * introduce ({g c d(9)},{gcd(6)#l}) 2 (2) 

* * introduce (0,{gcd(6)#l,gcd(9)#2}) 3 (3) 

> - > apply ({gcd(3)},{gcd(6)#l}) 3 (4) 

* * introduce (0, (gcd(6)#l,gcd(3)#3}) 4 (5) 

>-» apply ({gcd(3)|, (gcd(3)#3}) 4 (6) 

* * introduce (0,{gcd(3)#3,gcd(3)#4}) 5 (7) 

* * apply ({gcd(0)},{gcd(3)#3}) 5 (8) 

* * introduce (0,{gcd(3)#3,gcd(O)#5}) 6 (9) 

* * apply ( 0 , {gCd(3)#3})6 ( 19 ) 



(gcd2 N = 6 A M = 9) 
(gcd2 N = 3 A M = 6) 
(gcd2 N = 3 A M = 3) 
(gcdl) 



No more transition rules are possible, so this is the final state. 



2 Note in practice we only need to keep track of tuples where H 2 is empty, since 
otherwise these CHR constraints are being deleted and the firing can not reoccur. 




The Refined Operational Semantics of Constraint Handling Rules 



95 



3 The Refined Operational Semantics u> r 

The refined operational semantics establishes an order for the constraints in G. 
As a result, we are no longer free to pick any constraint from G to either Solve 
or Introduce into the store. It also treats CHR. constraints as procedure calls: 
each newly added active constraint searches for possible matching rules in order, 
until all matching rules have been executed or the constraint is deleted from the 
store. As with a procedure, when a matching rule fires other CHR constraints 
might be executed and, when they finish, the execution returns to finding rules 
for the current active constraint. Not surprisingly, this approach is used exactly 
because it corresponds closely to that of the language we compile to. 

Formally, the execution state of the refined semantics is the tuple ( A , S, B , T) n 
where S, B, T and n, representing the CHR store, builtin store, propagation 
history and next free identity number respectively, are exactly as before. The 
execution stack A is a sequence of constraints, identified CHR constraints and 
occurrenced identified CHR constraints, with a strict ordering in which only the 
top-most constraint is active. An occurrenced identified CHR constraint cffi : j 
indicates that only matches with occurrence j of constraint c should be con- 
sidered when the constraint is active. Unlike in the theoretical operational se- 
mantics, the same identified constraint may simultaneously appear in both the 
execution stack A and the store S. 

Given initial goal G, the initial state is as before. Just as with the theoretical 
operational semantics, execution proceeds by exhaustively applying transitions 
to the initial execution state until the builtin solver state is unsatisfiable or no 
transitions are applicable. The possible transitions are as follows: 

1. Solve ([c|A],<S'o W Si,B,T) n >— > (Si -H- A, So l±l Si,cA B,T) n where c is 
a builtin constraint, and vars(So) C fixed(B), where fixed(B) is the set of 
variables fixed by B 3 . This reconsiders constraints whose matches might be 
affected by c. 

2. Activate ([c\A], S, B,T) n >— » ([c#n : 1| A], {c#n} 1±) 5, B, T}(„ +1 ) where c is 
a CHR constraint (which has never been active). 

3. Reactivate ([c#i\A], S, B,T) n >—> ([cffi : 1| A],S,B,T) n where c is a CHR 
constraint (re-added to A by Solve but not yet active). 

4. Drop ([c#i : j\A\, S, B , T) n >—> (A, S, B , T) n where c#i : j is an occurrenced 
active constraint and there is no such occurrence j in P (all existing ones have 
already been tried thanks to transition 7). 

5. Simplify ([cffi : j\A ], {cffi} l±) H\ l±) H 2 W H 3 l±l S, B, T) n >— > (G -H- A, H\ l±l 
S,9 A B, T') n where the j th occurrence of the CHR predicate of c in a (renamed 
apart) rule in P is 

r@H[\H' 2 ,d j ,H , 3 «=► g\C 

and there exists matching substitution 9 is such that c = 9(dj), chr(Hi) = 
0(H[), chr(H 2 ) = 9(H' 2 ), chr(H 3 ) = 9(H 3 ), and V |= B — > 3b(9 A g ), and 

v £ fixed(B) if D \= 3 V (B) A 3 p ( v )p(B) —> v = p(v) for arbitrary renaming p. 



3 
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the tuple id(Hi) ++ [z] -H- id(H 2 ) -H- idiyH^) -H- [r] ^ T. In the result T' = 
T U {idiyHi) \- id(H 2 ) -H- [z] -H- id^H^) -I— I- [?’]}. 

6. Propagate ([c#z : j\A],{c#i} l±l Hi l±l H 2 W H 3 1±) S, B,T) n >-> (C ++ [c#z : 
j\A\,{c#i} l±l Hi l±) H 2 l±) S, 0 A B,T') n where the j th occurrence of the CHR 
predicate of c in a (renamed apart) rule in P is 

r@H[,d j ,H^\H' 3 «=► g\C 

and there exists matching substitution 9 is such that c = 9{dj), chr{H \ ) = 
9(H[), chr(H 2 ) = 9{H! 2 ), chr(H 3 ) = #(# 3 ), and V |= B — » 3b(9 A g), and the 
tuple id(Hi) ++ [z] ++ id(H 2 ) ++ id(Hz) -H- [r] ^ T. In the result T' = 
T U {idiyH\) -\— |- [z] -|— |- id(H 2 ) I- id(H 3 ) -H I - [r]}. 

The role of the propagation histories T and T' is exactly the same as with 
the theoretical operational semantics, u>t- 

7. Default ([c#z : j\A),S,B,T) n >— ► ([c#z : j + 1| A],S,B,T) n if the current 
state cannot fire any other transition. □ 

The refined operational semantics is still non- deterministic. Its first source of 
non-determinism is the Solve transition where the order in which constraints S\ 
are added to the activation stack is still left open. The definition above (which 
considers all non-fixed CHR constraints) is weak. In practice, only constraints 
that may potentially cause a new rule to fire are re-added, see [5, 10] for more 
details. 

The other source of non-determinism occurs within the Simplify and Prop- 
agate transitions, where we do not know which partner constraints (Hi, H 2 and 
H 3 ) may be chosen for the transition, if more than one possibility exists. 

Both sources of non-determinism could be removed by further refining the op- 
erational semantics, however we use non-determinism to model implementation 
specific behaviour of CHRs. For example, different CHR implementations use dif- 
ferent data structures to represent the store, and this may inadvertently affect 
the order partner constraints are matched against a rule. By leaving matching or- 
der non-deterministic, we capture the semantics of all current implementations. 
It also leave more freedom for optimization of CHR execution (see e.g. [12]). 

Example 6. The following shows the derivation under u> r semantics for the gcd 
program in Example 4 and the goal gcd(6) ,gcd(9). For brevity B and T have 
been eliminated and the substitutions 9 applied throughout. 





<[gcd(6),gcd(9)],0)i 


(1) 


* * activate 


<[gcd(6)#l : l,gcd(9)], (gcd(6)#l}) 2 


(2) 


^ X3 

default 


([gcd(6)#l : 4, gcd(9)j, (gcd(6)#l}) 2 


(2) 


* * drop 


([gcd(9)],{gcd(6)#l}) 2 


(2) 


* * activate^ 5 


■default <[gcd(9)#2 : 2], {gcd(9)#2,gcd(6)#l}) 3 


(3) 


* ^ simplify 


<[gcd(3)],{gcd(6)#l}} 3 


(4) 


* * activate^ 5 


■default <[gcd(3)#3 : 3], {gcd(3)#3, gcd(6)#l}) 3 


(5) 


* * propagate 


<[gcd(3), gcd(3)#3 : 3], (gcd(3)#3}) 4 


(6) 


* * activate^ 5 


■default ([gcd(3)#4 : 2,gcd(3)#3 : 3], {gcd(3)#4, gcd(3)#3}) 5 


(7) 


* * simplify 


(fgcd(O), gcd(3)#3 : 3], (gcd(3)#3}) 5 


(8) 


* * activate 


<[gcd(0)#5 : l,gcd(3)#3 : 3], {gcd(0)#5, gcd(3)#3}) 6 


(9) 


* * simplify 


([gcd(3)#3 : 3], (gcd(3)#3}) 6 


(10) 


> — * default ' 1 — * 


drop (e, {gcd(3)#3}}6 


(10) 
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4 The Relationship Between the Two Semantics 

Once both semantics are established, we can define an abstraction function a 
which maps execution states of ay to ay as follows: 

a((A, S , B, T) n ) = (noJd(A), S, B, T) n 

where noJd(A) = {c \ c £ A is not of the form cffi or cffi : j}. 

Example 7. A state in Example 6 with number (N) is mapped to the state in 
Example 5 with the same number. For example, the state ([gcd(O), gcd(3)#3 : 
3], {gcd(3)#3})5 corresponds to ({gcd(O)}, {gcd(3)#3})5 since both are num- 
bered (8). 

We now extend a to map a derivation D in ay to the corresponding deriva- 
tion a(D) in ay, by mapping each state appropriately and eliminating adjacent 
equivalent states: 

, „ nl — a (^) if D = S 2 >— ► D' and a (Si) = a{S 2 ) 

a ' 1 >— * ' a(Si) >— > a(D) otherwise 

We can now show that each to r derivation has a corresponding u>t derivation, 
and the final state of the u) r corresponds to a final state in the u>t derivation. 

Theorem 1 (Correspondence). Given a derivation D under ay then there 
exists a corresponding derivation a(D) under u>t ■ If S is the final state in D 
then a(S) is a final state under u>t- 

Theorem 1 shows that the refined operational semantics implements the the- 
oretical operational semantics. Hence, the soundness and completeness results 
for CHRs under the theoretical operational semantics hold under the refined 
operational semantics ay. 



4.1 Termination 

Termination of CHR. programs is obviously a desirable property. Thanks to The- 
orem 1, termination of uy programs ensures termination of ay. 

Corollary 1 . If every derivation for G in ay terminates, then every derivation 
for G in L 0 r also terminates. 

The converse is clearly not true, as shown in Example 1. In practice, proving 
termination for CHR programs under the theoretical operational semantics is 
quite difficult (see [8] for examples and discussion). It is somewhat simpler for 
the refined operational semantics but, just as with other programming languages, 
this is simply left to the programmer. 
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4.2 Confluence 

Since both operational semantics of CHRs are non-deterministic, confluence of 
the program, which guarantees that whatever order the rules are applied in leads 
to the same result, is essential from a programmer’s point of view. Without it 
the programmer cannot anticipate the answer that will arise from a goal. 

Formally, a CHR program P is confluent under semantics u> if for any goal G 
and any two derivations ( G , 0, true, 0)i >— ►* (_, and (G, 0, true , 0)i >— ►* 

(_, S 2 , B 2 , _)_ we have that V |= 3g(Si ABi) <-> /\Bf). That is, the resulting 

constraints stores are equivalent. 

Confluence of the theoretical operational semantics of CHR programs has 
been extensively studied [1,2]. Abdennadher [1] provides a decidable confluence 
test for the theoretical semantics of terminating CHR programs. Essentially, it 
relies on computing critical pairs where two rules can possibly be used, and 
showing that each of the two resulting states lead to equivalent states. 

Just as with termination, thanks to Theorem 1, confluence under ui t implies 
confluence under u r . 

Corollary 2. If CHR program P is confluent with respect to ut, it is confluent 
with respect to ui r . 

5 Checking Confluence for u> r 

One of the benefits of exposing the refined operational semantics is the ability to 
write and execute programs that are non-confluent with respect to the theoretical 
operational semantics, but are confluent with respect to the refined operational 
semantics. In order to take advantage of this, we need to provide a decidable 
test for confluence under ui r . This test must be able to capture a reasonable 
number of programs which are confluent under u) r but not under iv t . However, 
this appears to be quite difficult. 

Example 8. For example, consider the following CHR program 

pi @ p <=> true. 

p2 @ q(_) , p <=> true. 

Rule p2 looks suspiciously non-confluent since, if it was the only rule present, 
the goal q(a) ,q(b) ,p could terminate with either q(a) or q(b) left in the store. 
However, when combined with pi, p2 will never fire since any active p constraint 
will be deleted by pi before reaching p2. Thus, the program is u> r confluent. 

The example illustrates how extending the notion of critical pairs can be 
difficult, since many critical pairs will correspond to unreachable program states. 

As mentioned before, there are two sources of non-determinism in the re- 
fined operational semantics. The first source, which occurs when deciding the 
order in which the CHR constraints are added to the activation stack while 
applying Solve, is hard to tackle. In practice, we will avoid re-activating most 
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CHR constraints in the store, by only considering those which might now cause 
a rule to fire when it did not fire before (see [5,10] for more details). How- 
ever, if re-activation actually occurs, the programmer is unlikely to have any 
control on what order re-activated constraints are re-executed. To avoid this 
non-determinism we will require Si to be empty in any Solve transition. This 
has in fact been the case for all the examples considered so far except fib, and all 
those in Section 6, since all CHR constraints added to the store had fixed argu- 
ments. Even for f ib we could safely avoid reactivating the f ib constraints whose 
second arguments are not fixed, since these arguments have no relationship with 
the guards. 

For programs that really do interact with an underlying constraint solver, 
we have no better solution than relying on the confluence test of the theoretical 
operational semantics, for in this case it is very hard to see how the programmer 
can control execution sufficiently. 

The second source of non-determinism occurs when there is more than one 
set of partner constraints in the CHR store that can be used to apply the Sim- 
plify and Propagate transitions. We formalise this as follows. A matching of 
occurrence j with active CHR constraint c in state ([cffi : j\A\, S, B, T) n is the 
sequence of identified constraints Hi ++ H 2 -H- H 3 -H- [c#i] used in transitions 
Simplify and Propagate. The goal of the matching is the right hand side of 
the associated rule with the matching substitution applied, i.e., 9(C). 

Non-confluence arises when multiple matchings exist for a rule R, and R is 
not allowed to eventually try them all. This can happen if firing R with one 
matching results in the deletion of a constraint in another matching. 

Definition 1. An occurrence j in rule R is matching complete if for all reach- 
able states ([cffi : j |A], S, B , T) n with Mi , ..., M m possible matchings and G%, . . . , 
G m corresponding goals, firing R for any matching Mi and executing Gi does not 
result in the deletion of a constraint occurring in a different matching Mk, k ^ l . 

Note that R itself may directly delete the active constraint. If so, R will only 
be matching complete if there is only one possible matching, i.e., m = 1. 

Example 9. Consider the CHR program in Example 2. The occurrence of entry 
in rule 11 is matching complete since the lookup constraint is never stored (it is 
deleted before it becomes inactive). This is not however the case for the occur- 
rence of lookup in 11. Goal entry(a,b) ,entry(a,c) ,lookup(a,V) will return 
V=b or V=c depending on which of the two matchings of the occurrence of lookup 
in 11 ( [entry(a,b)#l ,lookup(a,V)#3] or [entry (a, c)#2 , lookup(a, V)#3] ) 
is used, i.e., depending on which partner entry constraint is used for 
lookup(a, V) in 11. The code is matching complete if the database only contains 
one entry per key. Adding the rule 

killdup @ entry(Key, Vail) \ entry(Key ,Val2) <=> true. 

which throws away duplicate entries for a key, provides a functional dependency 
from Key to Val in entry(Key,Val). This rule makes the occurrence matching 
complete, since only one matching will ever be possible. 
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Matching completeness can also be broken if the body of a rule deletes con- 
straints from other matchings. 

Example 10. Consider the following CHR program 

rl 8 p, q(X) ==> r (X) . 
r2 @ p, r(a) <=> true. 

The occurrence of p in rl is not matching complete since the goal q(a) ,q(b) ,p, 
will obtain the final state q(a) ,q(b) or q(a) ,q(b) ,r(b) depending on which 
partner constraint (q(a) or q(b)) is used for the occurrence of p in rl. This is 
because the goal of the first matching (r(a)) deletes p. 

A matching complete occurrence is guaranteed to eventually try all possi- 
ble matchings for given execution state S. However, matching completeness is 
sometimes too strong if the user doesn’t care which matching is chosen. This is 
common when the body does not depend on the matching chosen. 

Example 11. For example, consider the following rule from a simple ray tracer. 

shadow 0 sphere (C,R,_) \ light_ray(L,P,_,_) <=> blocks (L,P,C,R) I true. 

This rule calculates if point P is in shadow by testing if the ray from light L is 
blocked by a sphere at C with radius R. Consider an active light_ray constraint, 
there may be more than one sphere blocking the ray, however we don’t care 
which sphere blocks, just if there is a sphere which blocks. This rule is not 
matching complete but, since the matching chosen does not affect the resulting 
state, it is matching independent. 

Definition 2. A matching incomplete occurrence j which is deleted by rule R 
is matching independent if for all reachable states ([cffi : j\A], S, B,T) n with 
Mi , . . . , M m possible matchings and G i , . . . , G m corresponding goals, then all the 
final states for ( Gk , S'fc, B , T]f ) n , 1 < k < m are equivalent, where Sk is the store 
after firing on matching M \ and T k is the residting history. 

Suppose that a rule is matching complete, and there are multiple possible 
matchings. The ordering in which the matchings are tried is still chosen non- 
deterministically. Hence, there is still potential of non-confluence. For this reason 
we also require order independence, which ensures the choice of order does not 
affect the result. 

Definition 3. A matching complete occurrence j in rule R is order indepen- 
dent if for all reachable states ([cffi : j\ A], S, B, T) n with M\, . . . , M m possi- 
ble matchings and G \ , . . . , G m corresponding goals, the execution of the state 
{G a (i) ++ • • • ++ ,B,T') n where S' is the CHR store S where all con- 

straints deleted by any matching are deleted, and T' has all sequences added by 
all matchings, for any permutation a, leads to equivalent states. 

Note that, since j is matching complete, S' is well defined. Order indepen- 
dence is a fairly strong condition and, currently, we have little insight as to how 
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to check it beyond a limited version of the confluence check for the theoretical 
operational semantics. Thus, we currently require user annotations about order 
independence. A matching complete occurrence, which may have more than one 
matching, only passes the confluence checker if all CHR constraints called by 
the body are annotated as order independent. 

Example 12. Consider the following fragment for summing colors from a ray 
tracer. 

addl 0 add_color (Cl) , color(C2) <=> C3 = Cl + C2, color(C3) . 
add2 0 add_color(C) <=> color (C) . 

All occurrences of color and add_color are matching complete. Furthermore, 
calling add_color (Ci) , ..., adcLcolor (C„) results in color (Ci+...+C„) . Since 
addition is symmetric and associative, it does not matter in what order the 
add_color constraints are called. Consider the occurrence of output in 

render @ output(P) \ light_ray(_,P,C,_) <=> add_color(C) . 

Here, calling output (P) calculates the (accumulated) color at point P where any 
light_rays (a ray from a light source) may intersect. If there are multiple light 
sources, then there may be multiple light_ray constraints. The order add_color 
is called does not matter, hence the occurrence is order independent. 

We now have sufficient conditions for a simple confluence test. 

Theorem 2 (Confluence Test). Let P be a CHR program such that: 

1. Starting from a fixed goal, any derived state is also fixed; 

2. All occurrences in rules are matching complete or matching independent; 

3. All matching complete occurrences in rules are order independent. 

Then P is co r confluent for fixed goals. 

The HAL CHR confluence checker implements partial tests for fixedness of 
CHR constraints, matching completeness and matching independence, and relies 
on user annotation for determining order independence. 

The confluence checker uses mode checking [9] to determine which CHR 
constraints are always fixed. A non-fixed constraint may also be safe, as long as 
it is never in the store when it is not active (such as lookup from Example 2). 
We call such constraints never stored. 

The confluence checker uses information about never stored constraints and 
functional dependency analysis (see [12]) to determine how many possible match- 
ings (0, 1 or many) there are for each occurence in a given rule. If there are 
multiple possible matchings for an occurence, it then checks that the removal of 
other matching constraints is impossible, by examining the rule itself and using 
a reachability analysis of the “call graph” for CHR rules, to determine if the 
constraints could be removed by executing the body of the rule. 

The checker determines matching independence by determining which vari- 
ables occuring in the body are functionally defined by the active occurence, mak- 
ing use of functional dependency analysis to do so. If all variables in the body 
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and the deleted constraints are functionally defined by the active occurence, the 
occurence is matching independent. 

Only bodies restricted to built-in constraints are considered as order indepen- 
dent by the current confluence checker. Otherwise, we rely on user annotation. 

6 Case Studies: Confluence Test 

This section investigates the confluence of two “real-life” CHR programs using 
our confluence checker. The programs are bounds - an extensible bounds prop- 
agation solver, and compiler - a new (bootstrapping) CHR compiler. Both were 
implemented with the refined operational semantics in mind, and simply will not 
work under the theoretical semantics. 

6.1 Confluence of bounds 

The bounds propagation solver is implemented in HAL and has a total of 83 
rules, 37 CHR constraints and 113 occurences. An early version of a bounds 
propagation solver first appeared in [12]. The current version also implements 
simple dynamic scheduling (i.e. the user can delay goals until certain conditions 
hold), as well as supporting ask constraints. This program was implemented 
before the confluence checker. 

The confluence checker finds 4 matching problems, and 3 order independence 
problems. One of the matching problems indicated a bug (see below), the oth- 
ers are attributed to the weakness in the compiler’s analysis. We only had to 
annotate one constraint as order independent. 

The confluence analysis complained that the following rule is matching in- 
complete and non-independent when kill (Id) is active since there are (poten- 
tially) many possible matchings for the delayed_goals partner. 

kill 0 kill(Id), delayed_goals (Id,X,_, ... ,_) <=> true. 

Here delayed_goals (Id, X, represents the delayed goals for bounds 
solver variable X. The code should be 

killl ® kill(Id) \ delayed_goals (Id,X,_, ... ,_) <=> true. 
kill2 ® kill(_) <=> true. 

This highlights how a simple confluence analysis can be used to discover bugs. 

The confluence analysis also complains about the rules for bounds propaga- 
tion themselves. The reason is that the constraint bounds (X,L,U) which stores 
the lower L and upper U bounds of variable X has complex self- interaction. Two 
bounds constraints for the same variable can interact using, for example, 

b2b 0 bounds (X, LI , U1 ) , bounds (X,L2,U2) 

<=> bounds(X,max(Ll ,L2) ,min(Ul ,U2) ) . 

Here, the user must annotate the matching completeness and order independence 
of bounds. In fact, the relevant parts of the program are confluent within the 
theoretical operational semantics, but this is currently beyond the capabilities 
of our confluence analysis (and difficult because it requires bounds reasoning). 
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6.2 Confluence of compiler 

The bootstrapping compiler is implemented in SICStus Prolog (using the CHR 
library), including a total of 131 rules, 42 CHR constraints and 232 occurences, 
and performs most of the analysis and optimisations detailed in [12]. After boot- 
strapping it has similar speed to the original compiler written in Prolog and 
produces more efficient code due to the additional analysis performed. During 
the bootstrap, when compiling itself the first time, the new code outperformed 
the old code (the SICStus Prolog CHR compiler, 1100 lines of Prolog) by a fac- 
tor of five. This comparison is rather crude, measuring the costs and effects of 
the optimisations based on the additional analysis and the improved runtime 
system at once. Yet it demonstrates the practicality of the bootstrapping ap- 
proach for CHRs and that CHRs as a general purpose programming language 
under the refined semantics can be used to write moderately large sized verifiable 
programs. 

Bootstrapping CHRs as such aims at easier portability to further host lan- 
guages and as an internal reality check for CHRs as a general purpose program- 
ming system. To the best of our knowledge, the bootstrapping compiler is the 
largest single CHR program written by hand. (Automatic rule generators for 
constraint propagation algorithms [3] can produce large CHR programs too, but 
from the point of the compiler their structure is rather homogeneous in compar- 
ison to the compiler’s own code). 

The confluence checker finds 14 matching problems, and 45 order indepen- 
dence problems. 4 of the matching problems are removed by making functional 
dependencies explicit. The others are attributed to the weakness in the com- 
piler’s analysis. We had to annotate 18 constraints as order independent. 

7 Conclusion 

The refined operational semantics for Constraint Handling Rules provides a pow- 
erful and expressive language, ideal for applications such as compilers, since 
fixpoint computations and simple database operations are straightforward to 
program. In order to support programming for this language we need to help 
the author check the confluence of his program. In this paper we have defined a 
partial confluence checker that is powerful enough to check many idioms used in 
real programs. In the future we intend to extend this checker to better handle 
order independence, and to include the ability to check confluence with respect 
to the theoretical semantics. 
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Abstract. In this paper we investigate how to extend a generic con- 
straint solver that provides not only tell constraints (by adding the con- 
straint to the store) but also ask tests (by checking whether the con- 
straint is entailed by the store), with general ask constraints. Ask con- 
straints are important for implementing constraint implication, extensi- 
ble solvers using dynamic scheduling and reification. While the ask-test 
must be implemented by the solver writer, the compiler can extend this 
to provide ask behaviour for complex combinations of constraints, in- 
cluding constraints from multiple solvers. We illustrate the use of this 
approach within the HAL system. 



1 Introduction 

A constraint c of constraint domain T> expresses relationships among variables 
of V. All constraint programming frameworks (such as CLP( V) [4,5]) use c 
as a tell constraint, allowing the programmer to add the relationship to the 
current constraint store C and check that the result is possible satisfiable, i.e., 
V \= 3 (C A c). However, some frameworks (such as cc(V) [6]) also use c as an 
ask constraint, allowing the programmer to detect constraints stores C for which 
the relationship already holds, i.e., V |= C — > c. 

Ask constraints are often used to control execution by associating them to 
some goal, which is to be executed if and when the associated ask constraints 
succeed (i.e., become entailed by the constraint store). 

Example 1. The following cc [6] definition of the constraint min(X,Y,Z) 

min(X,Y,Z) X >= Y I Z = Y. 

min(X,Y,Z) Y >= X I Z = X. 

where Z is the minimum of X and Y is read as follows: when the min constraint 
is executed, wait until one of the ask constraints to the left of the bar I holds 
and then execute the tell constraint to the right of the bar. In the case of the 
cc framework the implementation also encodes a commit: once one ask con- 
straint holds the other will never be reconsidered. The code implements a form 
of implication whose logical reading is: 

min(x, y, z) <t=> (x > y — + z = y) A (y > x — > z = x) □ 
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Note that it is not enough for the above framework to be able to test whether 
a constraint c is entailed by the current constraint store (this one-off test will be 
referred to in the future as the ask-test). It also needs to detect changes in the 
constraint store that might affect the entailment of c, so that the ask-test can be 
re-executed. Hence, ask constraints are strongly connected to logical implication. 
In fact, it is this connection that makes them so useful for implementing many 
important language extensions, such as those involving constraint solvers. 

In this paper we consider a language that supports an ask construct of the 
form (i 7 ’i==>Gi& ... & F n ==>G n ), where each Fi is a complex formula over con- 
straints. The construct waits until some Fi is entailed by the store, and then 
executes its associated goal Gi. Several other languages, such as SICStus and 
ECLiPSe, implement related constructs for dynamic scheduling. However, they 
are typically hard-coded for a single solver, a pre-defined set of test conditions 
and do not support handling of (explicit) existential variables. Also, they usually 
only support formulas F made up of a single ask test condition. These restric- 
tions considerably simplify the implementation of the construct. 

This paper discusses the compilation of an ask construct with arbitrary ask- 
constraints, that allows the programmer to write code which closely resembles 
the logical specification. In particular, our contributions are as follows: 

— We show how to extend an ask-test implemented by some underlying solver 
to a full ask constraint supporting dynamic scheduling. 

— We show how to compile complex ask constraints which include existential 
variables and involve more than one solver, to the primitive ask-tests sup- 
ported by the solvers. 

— We show that the approach is feasible using an implementation in HAL [1], 



2 Ask Constraints as High-Level Dynamic Scheduling 

This section formalizes the syntax, logical semantics and operational semantics 
of our ask construct. Its basic syntax is as follows: 

( <ask-formula>i ==> goal\ & ... & <ask-formula> n ==> goal n ) 



where an <ask-formula>i is a formula made up of primitive ask constraints, 
disjunction, conjunction, and existential quantification. Formally: 



<ask-formula> 

< ask-formula> 

< ask-formula> 

< ask-formula> 



= < ask- constraint 
= <ask-formula> ’ ; ’ <ask-formula> 
= <ask-formula> ’ , ’ <ask-formula> 
= exists <var-list> <ask-formula> 



( primitive constraint) 

( disjunction) 

( conjunction) 

( existential quantification) 



where < ask- constraint represents some primitive ask constraint provided by 
a solver, <var-list> is a list of variables, and each ask- constraint / goal pair is 
referred to as an ask branch. 

The operational semantics of the above construct is as follows. As soon as 
the solver(s) determine that an ask-formula is entailed, the corresponding de- 
layed goal is called. The remaining ask branches will then never be considered, 
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thus effectively committing to one branch. This commit behaviour can be easily 
avoided by specifying separate ask constructs for each branch. Note that ask 
constraints are monotonic , i.e. , once they hold at a point during a derivation, 
they will always hold for the rest of the derivation. The advantage of monotonic- 
ity is that delayed goals need only be executed once, as soon as the associated 
ask constraints are entailed. 

The declarative semantics of an ask branch F==>G is simply logical implica- 
tion F — > G. The semantics of the whole construct (.Fj==>Gi& • • • & F n ==>G n ) 
is a conjunction of implications, but in order to agree with the commit the pro- 
grammer must promise that the individual implications agree. That is, that for 
program P : 

V /\ P \= (Fi A Fj) -► (Gi «-► Gj) 

In other words, if the ask construct wakes on the formula Fi causing Gj to 
execute, and later formula Fj is implied by the store, then Gj is already entailed 
by Gi and need not be executed. Note that under these conditions the commit is 
purely used for efficiency, it will not change the logical semantics of the program, 
although it may of course change the operational behaviour since the underlying 
solvers are likely to be incomplete. 

Example 2. Consider the following implementation of the predicate either (X , Y) 
which holds iff X or Y are true: 

either(X,Y) : - ( X = 0 ==> Y = 1 & Y = 0 ==> X = 1) . 

The logical reading is (A" = 0 — > Y = 1) A (Y = 0 — > X = 1) which is equivalent 
to (A = 1 V Y = 1) in the Boolean domain. If both ask constraints are made 
true (A' = 0AY = 0), the right hand sides are equivalent (Y = 1 <-> X = 1) to 
false. □ 

The proposed ask construct is indeed quite versatile. The following examples 
show how to use it to implement reification constraints, build constraints that 
involve more than one solver, and implement negation. 

Example 3. A reified constraint b <t=> c constrains the Boolean variable b to be 
true if c is implied by the store, and b to be false if ->c is implied by the store, 
and vice versa. Consider defining a predicate B <-> 3Y.X = [Y. Y] which “reifies” 
the right hand side. Note that the right hand side is equivalent to 3E13E2.X = 
[El, E 2], El = E 2. This can be implemented using ask constraints as 

reifcomp(B,X) 

( B=0 , (exists [E1.E2] X = [E1.E2]) ==> X=[E1,E2], E1^E2 
& B=1 ==> X=[Y,Y] 

& exists [Y] X=[Y,Y] ==> B=1 
St X= [] ; (exists [El] X=[E1]) ; 

(exists [El ,E2 ,R] X=[E1,E2|R], (E1/E2 ; R^[])) ==> B=0) 

These definitions assume X only takes on list values. □ 
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Example 4- The following program defines a length constraint which involves 
variables from a finite domain constraint solver, and from a Herbrand constraint 
solver for lists, and propagates information from one to the other: 

length(L.N) :- ( N = 0 ; L =[]==> N = 0 , L = [] 

& N >= 1 ; (exists [U1,U2] L = [U1|U2]) ==> 

L = [_ I LI] , N >= 1, length(Ll.N-l)) . □ 

Example 5. Consider the following definition of disequality 
neq(X, Y) (X = Y ==> fail). 

This (very weak) implementation of disequality waits until the arguments are 
constrained to be equal and then causes failure. □ 

3 Compiling Primitive Ask Constructs 

Let us now examine how to compile a primitive ask construct (i.e., one in which 
the left hand side of every ask branch is a single ask constraint) to the low-level 
dynamic scheduling supported by HAL 1 . 

3.1 Low-Level Dynamic Scheduling in HAL 

HAL [1] provides four low-level type class methods that can be combined to 
implement dynamic scheduling: get_id(Id) which returns an unused identifier 
for the delay construct; delay (eventi , Id, goali) which takes a solver event, an 
id and a goal, and stores the information in order to execute the goal whenever 
the solver event occurs; kill (Id) which causes all goals delayed for the input id 
to no longer wake up, and alive (Id) which succeeds if the input id is still alive. 
In order for a constraint solver in HAL to support delay, it must provide an 
implementation of the delay/3 method. Implementations of get_id/l, kill/1 
and alive/1 can either be given by the solver or be re-used from some other 
source (such as the built-in system implementations). For more details see e.g. [1]. 

There are three major differences between HAL’s dynamic scheduling and our 
ask construct. First, solver events ( eventi ) are single predicates representing an 
event in the underlying solver, such as “the lower bound of variable X changes”. 
No conjunction, disjunction or existential quantification of events is allowed. 
Second, solver events need not be monotonic. Indeed, the example event “lower 
bound has changed” is clearly not. And third, a delayed goal will be re-executed 
every time its associated solver event occurs, until its id is explicitly killed. 

Example 6. A finite domain integer (fdint) solver in HAL supporting dynamic 
scheduling typically provides the following solver events: 

fixed(V) The domain of V reduces to a single value; 

lbc(V) The lower bound of V changes (increases); 

ubc(V) The upper bound of V changes (decreases); 

dc(V) The domain of V changes (reduces). 

1 The compilation scheme can be adapted straightforwardly to other dynamic schedul- 
ing systems supporting delay identifiers. 
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Note that solver events do not need to be mutually exclusive: if the domain 
{1,3,5} ofX changes to {1}, the events fixed(X), ubc(X) and dc(X) all occur. 

Using the above events, a bounds propagator for the constraint X > Y can 
be written as 

geq(X, Y) get_id(Id) , delay (lbc (Y) , Id, set_lb(X,max(lb(Y) , lb(X) ) ) ) , 
delay (ubc(X) , Id, set_ub(Y,min(ub(X) ,ub(Y) ) ) . 

where lb (and ub) are functions returning the current lower (and upper) bound 
of their argument solver variable. Likewise, set_lb (and set_ub) set the lower 
(and upper) bound of their first argument solver variable to the second argument. 
The code gets a new delay id, and creates two delaying goals attached to this 
id. The first executes every time the lower bound of Y changes, enforcing X to 
be greater than this bound. The second implements the reverse direction. □ 

In addition to the four methods introduced above, HAL supports an “asks” 
declaration initially introduced into HAL to support the compilation of con- 
straint handling rules that interact with arbitrary solvers [2]. The declaration 
allows constraints solvers to declare the relationship between a predicate im- 
plementing constraint c as a tell constraint, the predicate implementing c as 
an ask-test (a one-off test), and the list of solver events which might indicate 
the answer to the ask-test has changed and, therefore, the ask-test should be 
re-executed. Its syntax is as follows: 

<ask-test> asks <tell- constraint* wakes <wakes-list> . 

Example 7. The finite domain solver introduced in Example 6 might define the 
ask-test and declaration for the geq constraint as follows. 

ask_geq(X,Y) asks geq(X,Y) wakes [lbc(X) ,ubc(Y)] . 
ask_geq(X, Y) lb(X) >= ub(Y) . 

The predicate ask_geq defines the ask-test for the geq constraint, and should be 
revisited when either the lower bound of X or the upper bound of Y change. □ 



3.2 From Ask- Tests to Primitive Ask Constructs 

The low-level dynamic scheduling of HAL allows us to compile the primitive ask 
construct: 

(ci (A}, ..., ==> Goah & ... & c n (A™, ..., A^J ==> Goal n ) 

if for each Ci, 1 < i < n, there exists the associated asks declaration: 
ask_a{X i,...,_Y mi ) asks a(Xi, X mi ) wakes [eventn, ..., eventin'] . 

This is done by replacing the ask construct with: 

get_id(Id) ,delay_ci (A(, ..., , Id, Goali) ,. . . , delay _C 2 (A", ..., A^ n , Id, Goal n ) 

and generating for each delays the following code: 
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delay_Ci (Ai, ..., X mi , Id, Goal) 

( alive(Id) -> ( askja ( A'i, ..., X mi ) -> kill(Id), call (Goal) 

; Retest = retest_Ci (Xi, ..., X m . , Id, Goal) 

delay (events , Id, Retest) , ... delay (eventi rli , Id, Retest) ) 

; true ) . 

retest_Ci (Ah , ..., A' mi , Id, Goal) : - 

( ask_a(Xi , ..., X mi ) _> kill(Id), call(Goal) ; true). 

The code for delays first checks if the Id is alive, and if so determines 
whether or not the constraint already holds by calling the ask-test ask_Cj /n. 
If so, the Id is killed, the Goal is immediately called, and no other action is 
necessary. If the constraint is not yet entailed, a closure for the retesting predicate 
retest_Ci is associated to each of the relevant solver events so that, each time 
a relevant solver event occurs, retest_Ci is executed. This predicates checks 
whether the ask-test now succeeds and, if so, kills the Id and executes the goal. 
Note that the delay predicate for each solver used in the ask construct must 
support the same delay id type. 

Example 8. Consider the asks declaration of Example 7. The compilation of 
min(X, Y) ( geq(X,Y) ==> Z = Y & geq(Y,X) ==> Z = X) . 
results in 

min(X, Y) get_id(Id) , delay_geq(X, Y, Id, Z = Y) , delay _geq(Y,X, Id, Z = X). 
delay_geq(X,Y,Id,Goal) 

(alive(Id) -> ( ask_geq(X,Y) -> kill(Id), call (Goal) 

; Retest = retest_geq(X,Y, Id, Goal) , 

delay (lbc (X) , Id, Retest) , delay (ubc (Y) , Id, Retest ) ) 

; true ) . 

retest_geq(X, Y, Id, Goal) : - ( ask_geq(X,Y) -> kill(Id), call(Goal) ; true).Q 



4 Compiling Disjunctions and Conjunctions 

Conjunctions and disjunctions in ask formulae can be compiled away by taking 
advantage of the following logical identities: 

1. Disjunctive implication: (a V b) — > c is equivalent to (a — > c) A (6 — > c); and 

2. Conjunctive implication: (a A b) — > c is equivalent to (a — > (b — > c)). 

Disjunctive implication is used to replace the branch 

<ask-formula>\ ; <ask-formula > 2 ==> Goal 
in a construct, by the two branches 

<ask-formula>\ ==> Goal & <ask-formula> 2 ==> Goal 
The two programs are operationally equivalent: the delayed Goal will be 
called once, after either <ask-formula> 1 or <ask-formula> 2 (whichever is first) 
hold. Similarly, conjunctive implication is used to replace the construct 
<ask-formula> 1 , <ask-formula> 2 ==> Goal 



by the construct: 
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( <ask-formula > i ==> ( <ask-formula > 2 ==> Goal ) ) 

Again, the new code is operationally equivalent: the delayed goal Goal will only 
be called once, after both <ask-formula> \ and <ask-formula > 2 hold. 

Note that the above simple conjunctive transformation cannot be directly 
applied to a branch appearing in a construct with 2 or more branches, because 
of the interaction with commit (the entire construct would be killed as soon 
as <ask-formula>\ held, even if <ask-formula > 2 never did). We can solve this 
problem by using a newly created (local) delay id (Lid) representing the delay 
on <ask-formula>\ while using the original (global) delay id (GId) for the internal 
delay (since if <ask-formula > 2 also holds, the whole ask construct can commit). 

An added complexity is that, for efficiency, we should kill the local delay id 
Lid whenever GId is killed (if, say, another branch commits) so that the low-level 
HAL machinery does not re-execute ( <ask-formula >2 ==> Goal) every time the 
events associated to <ask-formula> 1 become true. In order to do so we introduce 
the predicate register (Lid, GId) which links Lid to GId, so that if GId is ever 
killed, Lid is also killed. 

Example 9. Consider the compilation of 

p(X,Y,Z,T) : - ( (X >= Y ; X >= Z) ==> Z = T & (Y >= X, Z >= X) ==> X = T) . 

The resulting code is 

p(X,Y,Z,T) get_id(GId), 

delay_geq(X , Y, GId , Z = T) , delay _geq(X,Z, GId, Z = T) , 
get_Id(LId) , register(LId,GId) , 

delay _geq(Y , X , Lid , delay_geq(Z , X , GId , X = T)). □ 

By iteratively applying these rules, we can remove all conjunctions and dis- 
junctions from ask formulae (without existential quantifiers). 

5 Normalization and Existential Quantification 

One of the first steps performed by HAL during compilation is program nor- 
malization, which ensures that every function and predicate has variables as 
arguments. The normalization exhaustively applies the following rules: 

1. Rewrite 3 x.C A y = f(t\, . . . , ti , . . . , t n ) where / is an n-ary function and ti 
is either a non- variable term or a variable equal to some other tj , j 7^ i , to 
3x3v.C A v = ti A y = /(ti , . . . ,v, . . . , t n ), where v is a new variable. 

2. Rewrite 3x.C Ac{t\, . . . ,ti, . . . ,t n ) where c is an n-exy constraint symbol and 
ti is either a non- variable term or a variable equal to some other tj,j 7^ i, to 
3x3v.C Av = UA c(£i , ... ,v, , t n ) where v is a new variable. 

Example 10. Consider the following definition of a before- or- after constraint for 
two tasks with start times TI and T2 and durations D1 and D2 implements 
T2 > TI + D1 V TI > T2 + D2 without creating a choice. 

bef ore_after (TI ,D1 ,T2 ,D2) (TI + D1 > T2 ==> TI >= T2 + D2) , 

(T2 + D2 > TI ==> T2 >= TI + Dl) . 
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which will have the body normalized into: 

(exists [Ul] U1 = +(T1,D1), U1 > T2 ==> U2 = +(T2,D2), T1 >= U2) , 
(exists [U3] U3 = +(T2,D2), U3 > T1 ==> U4 = +(T1,D1), T2 >= U4) . 

thus adding the existentially quantified variables Ul, . . . , U4. While the explicit 
quantification can be omitted for the variables appearing in the tell constraints 
on the right hand side (U2 and U4), this is not true for the ask constraints, since 
the (implicit) existential quantifier escapes the negated context of the ask. □ 

Unfortunately, it is in general impossible to compile existential formulae down 
to primitive ask constraints. Only the solver can answer general questions about 
existential formulae. 

Example 11. Consider an integer solver which supports the constraint X > Y and 
the function X = abs(Y) (which constrains X to be the absolute value of Y). 
The following ask construct ( exists [N] abs(N) = 2, N > 1 ==> Goal ) will 
always hold. However, it is impossible to separate the two primitive constraints 
occurring in the ask formula. Instead, we would have to ask the solver to treat 
the entire conjunction at once. □ 

Thankfully, although normalization can lead to proliferation of existential 
variables in ask formulae, in many cases such existential variables can be com- 
piled away without requiring extra help from the solver. Consider the expression 

(3x3v.v = f(yi,. . . ,y n ) A C) -> G 

If / is a total function, such a v always exists and is unique. Thus, as long as 
none of the variables y\. ... . y„ are existentially quantified (i.e appear in x) we 
can replace the above expression by the equivalent one 

3v.v = f(y u ...,y n )A((3x.C)^G) 

Example 12. Returning to Example 10, we can transform the body code to 

Ul = +(T1 ,D1) , (Ul > T2 ==> U2 = +(T2,D2), T1 >= U2) , 

U3 = +(T2,D2), (U3 > T1 ==> U4 = +(T1,D1), T2 >= U4) . 

which does not require existential quantifiers in the ask-formula. □ 

There are other common cases that allow us to compile away existential 
variables, but require some support from the solver. Consider the expression 

(3x3v.v = f(yi , . . . , y n ) A C) -> G 

where / is a partial function and none of the variables y \ , . . . , y n appears in x. 
This is equivalent to 

(3v.v = f(yi, ..., y n )) ->■ (3v.v = f(yi, ...,y n ) A (3 x.C -> G)) 

The result follows since if there exists a v of the form f(yi, . . ■ , y n ), then it is 
unique. Hence, the function / in the context of this test is effectively total. This 
may not seem to simplify compilation, but if we provide an ask version of the 
constraint 3v, v = f(yi, ■ ■ ■ , y n ) then we can indeed simplify the resulting code. 
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Example 13. Assuming we are dealing with integers, the expression (x + 2 y > 
2 — s- 6 = 1) is equivalent to (Bz'.z 1 = 2 V ) — > (3 z.z — 2 y A (x + z > 2 — * b = 1)). 
If the compiler knows that the constraint Bz'.z' = 2 y is equivalent to y > 0, 
compilation of the code 

g(X,Y,B) (X + 2~Y > 2 ==> B = 1) 

results in g(X,Y,B) (Y > 0 ==> (Z = 2~Y, (X + Z > 2 ==> B = 1) ) . □ 

To use this simplification we need versions of the ask constraint for partial 
functions. These can be provided using the already introduced mechanisms for 
mapping tell constraints to ask constraints. For example, the mapping for z = 2 y 
for a finite domain solver can be defined as 

nonneg(Y) asks exists [Z] Z = 2~Y wakes [lbc(Y)]. 
nonneg(Y) lb(Y) >= 0. 

To apply either of the simplifications above we also require information about 
total and partial functions. The HAL compiler already receives this information 
from the solver in terms of mode declarations. Example mode declarations that 
show the totality of + and the partialness of ~ are: 

: - mode in + in > out is det . 

: - mode in " in > out is semidet . 

Partial functions are common in Herbrand constraints. Consider the con- 
straint x = /(yi, . . . , y n ), where / is a Herbrand constructor. This constraint 
defines, among others, a partial (deconstruct) function f~ x from x to each 
Vi, 1 < i < n. For this reason the compiler produces new ask tests bound_f (X) for 
each Herbrand constructor /, which check whether X is bound to the function 
/. Herbrand term deconstructions are then compiled as if the asks declaration 

bound_f (X) asks exists [Yl,..,Yn] X = f(Yl,..,Yn) wakes [bound(X)] 

appeared in the program. Note that in order to use this form of the ask constraint 
we may have to introduce further existential variables. 

Example 14 ■ Consider the compilation of the fragment (exists [Y] X = [Y|Z] 
==> p(X, Z) ) . Although neither transformation seems directly applicable we can 
replace 3 Y.X = [ Y\Z ] by the equivalent BYBV.X = [Y \V] AV = Z and then use 
the partial function compilation to obtain 

’ bound. [ | ] ’ (X) ==> (X = [Y|V], (V = Z ==> p(X,Z))) □ 

6 Compiling Equality 

The general compilation scheme presented in previous sections assumes the exis- 
tence of a simple mapping between an ask-test, and a set of solver events which 
indicate the answer to the ask-test may have changed. However, this is not al- 
ways true, specially when dealing with structures that mix variables of different 
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solvers. Consider testing the equality of lists of finite domain integers. To do so 
requires consulting both the Herbrand list solver and the fdint solver. 

This problem is exacerbated by the fact that HAL supports polymorphic 
types, where some part of the type may only be known at run-time. Currently, 
the only solvers in HAL which include other solver types are Herbrand solvers 
and, thus, we will focus on them. However, the same issues arise if we wish to 
build sequence, multiset or set solvers over parametric types. 

The problem of multiple types already arises when defining the ask-test ver- 
sion of equality (==/2). We can solve this problem in HAL by using type classes, 
i.e., by defining a type class for the ==/2 method and letting the compiler gen- 
erate instances for this method for each Herbrand type. 

Example 15. The code generated by the HAL compiler for the implementation 
of method ==/2 in the case of the list(T) type is as follows: 

X == Y ( (var(X) ; var(Y)) -> X === Y 
; X = [] , Y = [] -> true 

; X = [XI |X2], Y = [Y1|Y2], XI == Yl, X2 == Y2 ). 
where ===/2 succeeds if its arguments are (unbound) identical variables. □ 

Extending the above ask-test to an ask constraint has two problems. First 
the only solver events currently supported by a Herbrand solver are: bound (X), 
which occurs if X is bound to a non- variable term; and touched (X) , which 
occurs if the variable X is bound or unified with another variable (which also 
has events of interest). The reason why these are the only events supported is 
because they are the only ones that are independent of the type of the subterms 
of X. Thus, the asks declaration can only be defined as: 

: - X == Y asks X = Y wakes [touched(X) ,touched(Y)] 
which results in a very weak behaviour since it only notices changes at the 
topmost level of X and Y. For example, the goal ?- neq(X,Y), X = f(U), Y = 
f (V) , U = V. will not fail since even though X and Y are in the end identical, 
the unification of U and V does not create a solver event to retest the equality. 

It is possible, though complex, to use overloading to introduce a new over- 
loaded solver event changed (X) which occurs if any subterm of X is changed in 
some way (including unification with a variable) . We could then provide an asks 
declaration 

: - X == Y asks X = Y wakes [changed(X) , changed(Y)] 
which does not suffer from the above problem. However, there is a second prob- 
lem which affects both solutions: repeatedly calling ==/2 from the top of the 
term is inefficient for large terms. A more efficient solution is to only partially 
retest. 

For this we introduce a new ask-test for Herbrand terms, and show how it 
can be used to implement an efficient ask-test version of ==/2. The ask-test is 
samefunctor(X,Y) which holds if X and Y have the same top-level functor, and 
can be implemented in Prolog as follows: 

samefunctor (X, Y) (var(X), X == Y -> true ; 

nonvar(X), nonvar(Y) , functor (X,F, A) , functor(Y,F, A) ) . 
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The advantage of this simple ask-test is that it only needs to be re-tested 
when touched(X) or bound(X), i.e., its asks declaration can be defined as : 

samefunctor (X, Y) asks samefunctor(X,Y) wakes [touched(X) ,bound(Y)] . 

indicating that we need to recheck the ask-test if X is bound to another variable 
(which fires the touched(X) event), since it could have been bound to Y 2 , and 
if X or Y is bound (the touched(X) event will fire if X is bound). Note that 
the ask-test has no corresponding tell version. 

Let us now see how an efficient ask-test version for ==/2 can be written 
using samefunctor. Suppose the type of the Herbrand terms is g{t\, . . . ,t n ) 
where t\,...t n are types, and the type constructor g/n has fi/mi , ..., fk/mk 
functor/ arities. Then, the following implements (a high level view of) predicate 
’ delay _==_g_n’ which waits until two terms X and Y of type g(ti, . . . ,t n ) are 
equal to execute Goal. (Id @ AskConstruct) indicates the Id that should be 
given to the delay predicate resulting from each branch in AskConstruct. 



’delay _==_g_n’ (X,Y,GId, Goal) 

(alive(GId) -> get_id(LId) , register (Lid, GId) , 

Lid @ ( samefunctor(X,Y) ==> 

( var(X) -> kill(GId) , call (Goal) 

; X = /i(Xi, ...,X mi ) , Y = /i(Yi,...,Y mi ), 

GId a ( Xi = Yi, .... X mi = Y mi ==> kill (GId), call (Goal)) 

; X = /fc(X 1; ...,X mjt ) , Y = A(Yi,...,Y m J, 

GId a ( Xi = Yi, .... X mfc = Y mjt ==> kill (GId), call (Goal)) 

) ) ; true ) . 

The code works as follows. If GId is alive, first a new local delay id Lid is cre- 
ated for delay on samefunctor, and this is registered with GId. The whole body 
delays on the samefunctor ask constraint. When that holds, we test whether 
the variables are identical (true if either is a variable) and, if so, fire the goal. 
Otherwise, the two functors must be the same. Thus, we find the appropriate 
case and then delay on the conjunction of equality of the arguments. Here we 
can use the global delay identifier GId as the delay id for the ask formulae ap- 
pearing for the arguments since at most one will be set up. The compilation of 
these conjunctions will, of course, introduce new local identifiers. When and if 
the arguments become equal, Goal will be called. Note that if the constructor 
ii/rrii has arity zero (i.e. rrii = 0), then there are no arguments to delay until 
equal, and the goal will be immediately called. 

The outermost ask construct code contains no explicit delay on equality, 
hence it can be compiled as described in the previous sections. The inner ask 
constructs do contain equality, and will be recursively handled in the same way. 



2 We do not need to delay on touched(Y), since if touched(Y) occurs, causing X and 
Y to become identical variables then touched (X) must have also occurred. 
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Example 16. The generated code for the type list(T) is: 

’ delay _== JList.l' (X.Y.GId, Goal) 

(alive(Id) -> get_id(LId) , register (Lid, GId) , 

delay_samefunctor (X, Y,LId, ’delay_==_list_l_b’ (X, Y,GId,Goal) 
; true) . 

’ delay _==_list_l_b> (X,Y, GId, Goal) 

( var(X) -> kill (GId), call (Goal) 

; X = [] , Y = [] , kill (GId), call (Goal) 

; X = [XI |X2], Y = [Y1IY2], get.id(LId) , register(LId,GId) , 

’ delay_==’ (XI ,X2 ,LId, ’delay_== ’ (X2 ,Y2, GId, Goal) ) ). □ 

In order for the solution to work for polymorphic types, the 1 delay _== ’ pred- 
icate is defined as a method for a corresponding type class. HAL automatically 
generates the predicate ’delay j==_gjn’ for every Herbrand type constructor 
g/n that supports delay and creates an appropriate instance. For non-Herbrand 
solver types, the instance must be created by the solver writer. Normal over- 
loading resolution ensures that at runtime the appropriate method is called. 

This solution kills two birds with one stone. Firstly, it resolves the problems 
of delaying on equality by generating specialized predicates for each type. Sec- 
ondly, because the predicate ’ delay _==’ is overloaded, delay on equality is now 
polymorphic. Thus, it is possible to implement a truly polymorphic version of, 
for example, the neq/2 constraint. We can similarly implement a polymorphic 
ask constraint for disequality. 

7 Experimental Results 

The purpose of our experimental evaluation is to show that compiling ask con- 
straints is practical, and to compare performance with hand-implemented dy- 
namic scheduling where applicable. In order to do so, a simple prototype ask 
constraint compiler has been built into HAL. It does not yet handle existential 
quantifiers automatically. In the future we plan to extend the compiler to do this 
and also optimize the compilation where possible. All timings are the average 
over 10 runs on a Dual Pentium II 400MHz with 648M of RAM running under 
Linux RedHat 9 with kernel version 2.4.20 and are given in milliseconds. 

The first experiment compares three versions of a Boolean solver written by 
extending a Herbrand constraint solver. The first, hand , is implemented using 
low-level dynamic scheduling (no compilation required). This is included as the 
ideal “target” for high-level compiled versions. The second, equals, implements 
the Boolean solver by delaying on equality, much like the either constraint 
in Example 2. Here, equals treats X = t as a partial function and delays on 
the specialised bound_t(X). Finally, nonvar implements the Boolean solver by 
delaying on the nonvar (X) ask-test (which holds if X is bound). Delaying on 
nonvar requires less delayed goals, since nonvar (X) subsumes both X=t and 
X=f. We believe an optimizing ask constraint compiler could translate equals to 
nonvar automatically. 

Table 1(a) compares the execution times in milliseconds of the Boolean 
solvers on a test suite (details explained in [2]). Most of the overhead of non- 
var compared to hand is due to the nonvar code retesting the nonvar ask-test 
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Table 1. Testing ask constraints: (a) Boolean benchmarks, (b) Sequence benchmarks 

(a) (b) 



Prog 


hand 


equals 


nonvar 


pigeon(8,7) 


524 


988 


618 


pigeon (24, 24) 


157 


338 


167 


schur (13) 


7 


11 


5 


schur (14) 


57 


87 


65 


queens (18) 


4652 


9333 


5313 


mycie(4) 


1055 


1988 


1218 


fulladder (5) 


260 


410 


320 


Geom. mean 


237 


179% 


107% 



Prog 


poly 


mono 


neq(lOOOO) 


575 


413 


neq(20000) 


1146 


848 


neq(40000) 


2312 


1676 


neq(80000) 


4592 


3362 


square (4) 


414 


219 


square (5) 


5563 


2789 


square (6) 


2476 


1213 


square (7) 


358 


175 


square (8) 


12816 


6168 


triples(3) 


88 


70 


triples(4) 


535 


436 


triples(5) 


4200 


3526 


Geom. mean 


1349 


64% 



(which always holds if the retest predicate is woken up) . The equals version adds 
overhead with respect to nonvar by using a greater number of (more specialised) 
ask constraints. 

Our second experiment, shown in Table 1(b)), compares two versions of a 
sequence (Herbrand lists of finite domain integers) solver built using both a Her- 
brand solver for lists, and a finite domain (bounds propagation) solver. The re- 
sulting sequence solver provides three ask constraints over “complex” structures: 
length(Xs ,L) (see Example 4), append(Xs,Ys,Zs) which constrains Zs to be 
the result of appending Xs and Ys (concatenation constraint), and neq(Xs,Ys) 
(see Example 5). The first benchmark neq(n) calls a single neq(Xs,Ys) con- 
straint, then iteratively binds Xs and Ys to a list of length n (which eventually 
leads to failure). The second benchmark square (n) tries to find a n x n square 
of Is and Os such that no row/column/diagonal (in both directions) are equal. 
This is solved by first building the sequences for each row, column, diagonal and 
the reverse, then making each not equal to each other via the neq constraint, 
and then labeling. Here, square (4) has no solution, but square (5-8) do. The 
third benchmark triples (n) tries to find n triples of sequences of Is and Os 
such that (1) the length of each sequence is < n (2) each sequence is not equal 
to any other sequence (from any triple); and (3) the concatenation for all triples 
must be equal. This example makes use of all three constraints, length, append 
and neq. All of triples (3-5) have solutions. 

We use two versions of the sequence constraints. The first poly uses the 
polymorphic delay on equality for the neq constraints. The second, mono is a 
hand-edited version of poly where (1) all polymorphism has been specialised; and 
(2) a more efficient representation of the global id type is used. We can only use 
this more efficient global id type if we know in advance the types of the local ids, 
something not possible when using polymorphism. We see that, overall, mono is 
36% faster than the more naive poly. 
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Another interesting result is the linear behaviour of the neq(n) benchmarks 
with respect to n. As each list becomes more instantiated, we do not retest for 
equality over the entire list, rather we only retest the parts that have changed. 
If we retested the entire list, then we would expect quadratic behaviour. 

8 Related Work and Conclusions 

Ask constraints are closely related to dynamic scheduling, CHRs, reification and 
concurrent constraint programming. 

The closest relation to this work is the when declarations of SICStus Prolog 
that allow a goal to delay until a test succeeds. These tests are similar to ask 
formulae (omitting existential quantification) over the primitive ask constraints 
nonvar(X), ground(X) and ?=(X,Y). The last succeeds when X and Y are ei- 
ther known to be equal or not equal. The SICStus compiler appears to do much 
the same translation of conjunctions and disjunctions as defined in Section 4, 
but does not allow (explicit) existential quantification. The ?=(X,Y) constraint 
includes the functionality of the ask equals defined in Section 6, but the SIC- 
Stus implementation only deals with the a single constraint solver (Herbrand). 
The second difference is that the SICStus implementation does not break down 
testing of equality so that previous equal parts need not be retested. 

CHRs are also closely related to this work, since an ask constraint is analogous 
to the guard of a CHR rule. We can consider the CHR rule (H<=>G \ B) as 
equivalent to ( H<=>(G==>B )) using ask constraints. This translation is generally 
inefficient, as delayed goals will be set up for every possible matching of H 
against the CHR store, and it is incompatible with some CHR optimisations, 
e.g. join ordering [3] and wakeup specialisation [2]. Instead, the guards for all 
rules are considered as a whole, and delayed goals are set up which may check 
multiple rules if a solver event occurs (see [2] for more details) . Another difference 
is that non-Herbrand existential variables are not yet handled by any CHR 
implementation we are aware of, this remains future work. 

Reified constraints allow similar functionality to ask constraints, particularly 
when combined with delaying an arbitrary goal until a Boolean variable is true. 
Both SICStus Prolog and ECLiPSe support reification of various constraints in 
their finite domain (and finite set) solvers, including conjunction, disjunction 
and implication. Again they do not handle explicit existential quantification. 

One of the advantages of ask constraints over reification is they allow us to 
implement reified complex constraints which cannot be implemented using reifi- 
cation alone due to the interaction with existential quantifiers, as in Example 3. 
In that sense the ask construct is strictly more expressive than reification alone. 

In both SICStus and ECLiPSe existential variables arising through normal- 
ization appear to be treated using the total function simplification described in 
Section 5, this can lead to erroneous behaviour. For example, in ECLiPSe the 
goalic:(Y < 0) , ic:(B =:= (X + sqrt(Y) >= 2) ) analogous to Example 13 
incorrectly fails rather than set B = 0. 

Guarded Horn Clauses [8] allows the programming of behaviour equivalent 
to ask formula for Herbrand constraints including conjunction, disjunction, and 
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implicit existential quantifiers. The cc(FD) [9] language includes a blocking im- 
plication similar to ask constraints without commit, but only allows single con- 
straints on the left hand side. However, one could use the cardinality constraint 
to mimic conjunction and disjunction. Both approaches treat a single solver, and 
do not handle explicit existential quantifiers. 

Oz supports complex ask formula using constraint combinators [7]. Here ask 
constraints are executed in a separate constraint store which is checked for en- 
tailment by the original constraint store. This is a powerful approach which can 
handle examples that our approach cannot. However, its handling of existential 
variables is weaker than ours. For instance, the Oz equivalent to Example 13 
will not set B to 1 when X > 0 and Y > 1. It would be interesting to extend 
Oz to handle existential variables better. 

A constraint programming language supporting multiple solvers should sup- 
port compilation of complex ask constraints. In this paper we have defined a 
solver-independent approach to this compilation, implemented it in HAL, and 
shown the resulting approach is practical and expressive. There is a significant 
amount of improvement that can be made to the naive compilation strategy 
defined here, by transformations such as collecting calls for the same event. In 
the future we plan to investigate several optimizations. 
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Abstract. Both Constraint Handling Rules (CHR) and tabling - as im- 
plemented in XSB - are powerful enhancements of Prolog systems, based 
on fix point computation. Until now they have only been implemented 
in separate systems. This paper presents the work involved in porting 
a CHR system to XSB and in particular the technical issues related to 
the integration of CHR with tabled resolution. These issues include call 
abstraction, answer projection, entailment checking, answer combination 
and tabled constraint store representations. Different optimizations re- 
lated to tabling constraints are evaluated empirically. The integration 
requires no changes to the tabling engine. We also show that the per- 
formance of CHR programs without tabling is not affected. Now, with 
the combined power of CHR and tabling, it is possible to easily intro- 
duce constraint solvers in applications using tabling, or to use tabling in 
constraint solvers. 



1 Introduction 

XSB (see [19]) is a standard Prolog system extended with tabled resolution. 
Tabled resolution is useful for recursive query computation, allowing programs 
to terminate in many cases where Prolog does not. 

Parsing, program analysis, model checking, data mining, diagnosis and many 
more applications benefit from tabled resolution. We refer the reader to [1] for 
a coverage of XSB’s SLG execution strategy. 

Constraint Handling Rules, or CHR for short, are a high level rule-based 
language (see [10]). As opposed to the top-down execution of XSB, it performs a 
bottom-up hxpoint computation. While CHR has currently many applications, it 
has been designed for writing constraint solvers in particular. Indeed, its compact 
and declarative syntax is excellent for prototyping succinct constraint solvers. 
Although its performance is not on par with constraint solvers written in lower- 
level languages, its flexibility and ease of use do favor a wider use. We refer the 
reader to [11] for an overview of CHR. 

CHR is not a self-contained but an embedded language. Although versions of 
CHR for Haskell and Java do exist, Prolog is its original and natural host. Just 
as the traditional combination of Constraint and Logic Programming [14], the 
combination of CHR with Logic Programming seems to be the most powerful 
and expressive combination. 

* Research Assistant of the Fund for Scientific Research - Flanders (Belgium) 
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However, by integrating CHR with not just any standard Prolog system, but 
XSB with tabled resolution, we get an even more powerful and expressive system. 
This system integrates both the bottom-up and top-down fixpoint computations, 
the favorable termination properties of XSB and the constraint programming of 
CHR. This combined power enables programmers to easily write highly declar- 
ative programs that are easier to maintain and extend. 

This proposed integration implies quite a number of implementation chal- 
lenges that have to be dealt with. Firstly, a CHR system for non-tabled use in 
XSB is needed, as it does not come with one already. In Section 2 we explain how 
an existing CHR system was ported to XSB with minimal changes to that CHR 
system. The attributed variables implementation in XSB was extended to full 
functionality in the process. Benchmarks show that no performance overhead 
is incurred for non-tabled use and that it is even on par with the commercial 
SICStus Prolog system. 

Secondly, the CHR system requires integration with tabled execution. In 
Section 3 we present the different interaction issues. We propose a solution to 
the conflict of the global CHR store with the required referential transparency 
of tabled predicates. Also two representations of tabled constraints are analyzed 
and a mechanism for answer constraint projection is presented. In addition, 
we study issues regarding answer entailment checking and advanced techniques 
for aggregate-like answer combination. The performance impact of tabling and 
several of the mentioned concepts are measured on a small constraint program. 
Throughout, we propose a declaration-based approach for enabling particular 
mechanisms that, in the spirit of both CHR and XSB, preserves the ease of use. 

Finally, we conclude this paper in Section 4 and discuss related and possible 
future work. 

2 The CHR System 

Initially the CHR system described in this paper was written for the hProlog 
system. hProlog is based on dProlog (see [8]) and intended as an alternative 
backend to HAL (see [7]) next to the current Mercury backend. The initial 
intent of the implementation of a CHR system in hProlog was to validate the 
underlying implementation of dynamic attributes (see [6]). 

The hProlog CHR system consists of a preprocessor and a runtime system: 

— The preprocessor compiles embedded CHR rules in Prolog program files into 
Prolog code. The compiled form of CHR rules is very close to that of the 
CHR system by Christian Holzbaur, which is used in SICStus [13] and YAP 
[3] . The preprocessor allows for experimentation with optimized compilation 
of CHR rules, both through static inference and programmer declarations. 

— The runtime system is nearly identical to that of Christian Holzbaur: sus- 
pended constraints are stored in a global constraint store. Variables in sus- 
pended constraints have attributes on them that function as indexes into 
this global store. Binding of these attributed variables causes the suspended 
constraints on them to trigger again. 
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Little difficulty was experienced while porting the preprocessor and runtime 
system from lrProlog to XSB. The main problem turned out to be XSB’s overly 
primitive interface for attributed variables: it did not support attributes in dif- 
ferent modules. Moreover, the actual binding of attributed variables was not 
performed during the unification, but it was left up to the programmer of the in- 
terrupt handler. This causes unintuitive and unwanted behavior in several cases: 
while the binding is delayed from unification to interrupt handling, other code 
can be executed in between that relies on variables being bound, e.g. arithmetic. 
Due to these problems of the current XSB attributed variables, it was deemed 
acceptable to model the attributed variables interface and behavior more closely 
to that of lrProlog. This facilitated the porting of the CHR system considerably. 



2.1 Performance Evaluation 

We have not adapted the compilation scheme of the CHR preprocessor in any 
way to accommodate the integration of CHR with tabling (see Section 3). We 
have only given in to a few minor changes for performance reasons, such as 
replacing calls to once(G) with (G -> true) as XSB does not inline the former. 

Hence we show that the performance of CHR without tabling is not com- 
promised in XSB. To do so we compare the results of eight benchmarks [17] in 
XSB with those in lrProlog, the origin of the CHR preprocessor, and SICStus, 
the origin of the runtime system and one of the first systems to feature CHR. 

The following causes for performance differences are to be expected: 

— Firstly, we expect the outcome to be most heavily influenced by the relative 
performance difference on Prolog code as the CHR rules are compiled to 
Prolog: we have observed that plain Prolog in SICStus is on average 1.39 
times faster than in XSB and lrProlog is 1.85 times faster than XSB. 

— Secondly, the results may be influenced by the slightly more powerful opti- 
mizations of the CHR preprocessor in lrProlog and XSB. To eliminate these 
effects we have disabled all advanced optimizations not performed by the 
SICStus CHR compiler. In addition, the checking of guard bindings has 
been disabled in both systems. This does not affect the benchmarks, since 
no binding or instantiation errors occur in the guards. This increases the 
fairness of comparison since hProlog’s analysis of redundant checks is more 
powerful and it does not intercept instantiation errors. 

— Thirdly, the implementation and representation of attributed variables dif- 
fers between the three systems. The global constraint store of CHR is rep- 
resented as an attributed variable and it may undergo updates each time a 
new constraint is imposed or a constraint variable gets bound. Hence, the 
complexity and efficiency of accessing and updating attributed variables eas- 
ily dominates the overall performance of a CHR program if care is not taken. 
Especially the length of reference chains has to be kept short, as otherwise 
accessing the cost of dereferencing the global store may easily grow out of 
bounds. 




Constraint Handling Rules and Tabled Execution 



123 



In lrProlog much care has been taken in choosing a representation and cor- 
responding implementation of the necessary operations, to take these con- 
siderations into account. SICStus and XSB have different low-level represen- 
tations that do not allow to limit the cost of dereferencing equally well. 

Table 1 shows the results for the benchmarks. All measurements have been 
made on an Intel Pentium 4 2.00 GHz with 512 MB of RAM. Timings are in 
milliseconds. The Prolog systems used are hProlog 2.4, SICStus 3.11.0 and our 
extension of XSB 2.6. 

From the results we learn that hProlog is the fastest for CHR, as was the 
case for plain Prolog. Both SICStus and XSB are much slower than hProlog for 
CHR than for plain Prolog. However, on average XSB is SICStus’s equal. 

The huge difference between hProlog and the other two systems cannot sim- 
ply be explained by the difference in runtimes of plain Prolog. The representation 
of attributed variables and the implementation of operations on them, seems to 
play a very important role in most of the benchmarks. The more efficient im- 
plementation of hProlog and XSB’s CHR runtime and generated code mostly 
explains the smaller gap between SICStus and XSB. 

All in all, XSB is not far behind SICStus in performance of CHR programs. 
This is quite remarkable considering that SICStus is clearly faster for plain 
Prolog programs. Future optimizations of the XSB CHR compiler may improve 
XSB’s CHR performance even further. However, it is certainly worthwhile to 
consider changing the low-level implementation of attributed variables in XSB 
to follow more closely that of hProlog. Such a change would be orthogonal to 
the CHR system and other applications of attributed variables would benefit as 
well. 

Table 1. Runtime performance of 8 CHR benchmarks in hProlog, SICStus and XSB. 



Benchmark 


hProlog 


SICStus 


XSB 




runtime 


relative 


runtime 


relative 


runtime 


relative 


bool 


870 


100.0% 


2,010 


231.0% 


1,729 


198.7% 


fib 


1,240 


100.0% 


1,620 


130.6% 


2,509 


202.3% 


fibonacci 


1,510 


100.0% 


4,260 


282.1% 


3,500 


231.8% 


leq 


1,020 


100.0% 


1,250 


122.5% 


1,889 


185.2% 


primes 


960 


100.0% 


1,870 


194.8% 


2,609 


271.8% 


ta 


1,190 


100.0% 


2,240 


188.2% 


2,378 


199.8% 


wfs 


850 


100.0% 


1,620 


190.6% 


2,039 


239.9% 


zebra 


1,500 


100.0% 


7,050 


470.0% 


3,578 


238.5% 


average 


- 


100.0% 


- 


226.2% 


- 


221.0% 



3 CHR and Tabled Execution 

The main challenge of introducing CHR in XSB is integrating the forward chain- 
ing fixpoint computation of the CHR system with the backward chaining fixpoint 
computation of tabled resolution. 
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A similar integration problem has been solved in [4] , which describes a frame- 
work for constraint solvers written with attributed variables for XSB . The name 
Tabled Constraint Logic Programming (TCLP) is coined in that publication. 

The main difference for the programmer between CHR and attributed vari- 
ables for developing constraint solvers, i.e. the fact that CHR is a much higher 
level language, should be carried over to the tabled context. Hence tabled CHR 
should be a more convenient paradigm for programming constraint solvers in 
than TCLP with attributed variables. 

Indeed, the internals that are presented in the following can easily be hidden 
from the user. All the user then needs to supply is the following declaration: 

table_chr f(_,_) with Options. 

meaning that the predicate f /2 should be tabled and it involves CHR constraints. 
A list of additional options may be provided. The meaning of this declaration 
and the possible options are explained in the rest of this section. 

In [4] the general framework specifies three operations to control the tabling 
of constraints: call abstraction, entailment checking of answers and answer pro- 
jection. It is left up to the constraint solver programmer to implement these 
operations with respect to his solver implementation. 

In the following we formulate these operations in terms of CHR. The oper- 
ations are covered in significant detail as the actual CHR implementation and 
the representation of the global CHR constraint store are taken into account. 
Problems that have to be solved time and again for attributed variable con- 
straint solvers are solved once and for all for CHR constraint solvers. Hence 
integrating a particular CHR constraint solver requires much less knowledge of 
implementation intricacies and decisions can be made on a higher level. 

The different steps in handling a call to a tabled predicate are depicted in 
Figure 1. The different steps are explained in the rest of this section. Section 

3.1 covers the representation of CHR constraint stores in tables. Next Section 

3.2 explains how a call to a tabled predicate is abstracted and how the answer 
to the abstracted call is related to the original call. Section 3.3 proposes a high- 
level way of specifying projection and Section 3.4 discusses entailment. Finally, 
Section 3.5 finishes with an evaluation of tabling with several optimizations on 
a small shipment problem. 

3.1 Tabled Store Representation 

As constraints can be part of a call or an answer of a tabled predicate, some 
representation for them in call and answer tables is needed. In general, this rep- 
resentation should allow comparing the constraints in a call with all call patterns 
in the call table to select either a variant (under variant based tabling) or the 
most specific generalization (under subsumption based tabling). Furthermore, it 
should be possible to convert from the ordinary representation of the constraints 
and back, for insertion into the call table and retrieval from the answer table. 

In the following we investigate two possible representations and discuss the 
complexity of variant checking. 
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Fig. 1 . Flowchart. 



Two tabled CHR store representations. Before considering constraint store 
representations for tables, it is necessary to know a little about the ordinary 
representation of the CHR constraint store. The global CHR constraint store is 
an updatable term, containing suspended constraints grouped by functor. Each 
suspended constraint is represented as a suspension term, including the following 
information: 

— the unique ID for sorting and equality testing 

— the goal to execute when triggered, this goal contains the suspension itself 
as an argument, hence creating a cyclic term. 

— the propagation history containing for each propagation rule the tuple of 
identifiers of other constraints that this constraint has interacted with 

Furthermore, variables involved in the suspended constraints behave as indexes 
into the global store: they have the suspensions stored in them as attributes. 

One piece of information that is not maintained during the normal CHR 
execution is the order in which constraints have been called. We will not do so 
for the sake of tabling either as this would overly complicate matters. Hence, we 
need to restrict ourselves to confluent 1 programs to avoid unpredictable behavior. 

Two different tabled CHR store representations have been explored with: 
the suspension representation and the naive representation. A discussion of their 
respective merits and weaknesses as well as an evaluation follow. 

Suspension representation. Here we aim to keep the tabled representation as 
close as possible to the ordinary representation. The idea here is to maintain 
the propagation history of the tabled constraints. In that way no unnecessary 
re-firing of propagation rules will occur after the constraints have been retrieved 
from the table. 

However, it is not possible to just store the ordinary constraint suspensions 
in the table as they are. Firstly, the tables do not deal with cyclic terms. This 

1 A set of rules is confluent if the order in which the constraints are imposed does not 
matter for the final state of the constraint store. 
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can be dealt with by breaking the cycles before storage and resetting them 
after fetching. Secondly, the unique identifiers have to be replaced after fetching 
by fresh ones as multiple calls would otherwise create multiple copies of the 
same constraints all with identical identifiers. Fortunately, attributed variables 
themselves can be stored in tables (see [5]). 

Naive representation. The naive representation aims at keeping the information 
in the table in as simple a form as possible: for each suspended constraint only 
the goal to pose this constraint is retained in the table. It is easy to create this 
goal from a suspension and easy to merge this goal back into another constraint 
store: it needs only to be called. 

When necessary the goal will create a suspension with a fresh unique ID 
and insert it into the constraint store. However in many cases it may prove 
unnecessary to do so because of some simplification through interaction with 
constraints in the calling environment. 

The only information that is lost in this representation is the propagation 
history. This may lead to multiple propagations for the same combination of 
head constraints. For this to be sound, a further restriction on the CHR rules 
is required: they should behave according to set semantics, i.e. the presence 
of multiple identical constraints should not lead to different answers modulo 
identical constraints. 



Evaluation of both representations. To measure the relative performance of the 
two presented representations, consider the following two programs: 



prop 




simp 


:- constraints a/1. 




constraints a/1, b/1. 


a(0) <=> true. 




b(0) <=> true. 


a(N) ==> N > 0 




b(N) <=> N > 0 


1 M is N - 1, a(M) . 




1 a(N) , M is N - 1, b(M) . 


p(N) :- a(N). 




p(N) b(N) . 



For both programs the predicate p(N) puts the constraints a(l)...a(N) in the 
constraint store. The prop program uses a propagation rule to achieve this while 
the simp program uses an auxiliary constraint b/1. The non-tabled version of 
the query p(N) or a(N) has time complexity O(N) for both the simp and the 
prop program. 

The two possible representations for the answer constraint store can be spec- 
ified in the tabling declaration as follows: 

:- table_chr p(_) with [representation(suspension)] . 

and 



table_chr p(_) with [representation (naive)] . 

Table 2 gives the results for the tabled query p(400) : runtime in milliseconds 
and space usage of the tables in bytes. For both programs the answer table 
contains the constraint store with the 400 a/1 constraints. 
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Table 2. Evaluation of the two tabled store representations. 



program 


represe 

runtime 


nt at ion (suspension) 

space 


represe 

runtime 


nt at ion (naive) 

space 


prop 


150 


2,153,100 


1,739 


270,700 


simp 


109 


1,829,100 


89 


270,700 



Most of the space overhead is due to the difference in representation: a sus- 
pension contains more information than a simple call. However, the difference is 
more or less a constant factor. The only part of a suspension in general that can 
have a size greater than 0(1) is the propagation history, that for prop is limited 
to remembering that the propagation rule has been used. For the simp program 
the propagation history is always empty. 

The runtime of the prop version with the suspension representation is con- 
siderably better than that of the version with the naive representation. In fact, 
there is a complexity difference. When the answer is retrieved from the table for 
the suspension representation, the propagation history prevents re-propagation. 
Hence answer retrieval is O(N). For the naive representation on the other hand, 
every constraint a (I) from the answer will start propagating and the complexity 
of answer retrieval becomes 0(N 2 ). 

On the other hand, for simp propagation history plays no role. The runtime 
overhead is mostly due to the additional overhead of the pre- and post-processing 
of the suspension representation as opposed to the simpler form of the naive 
representation. 

Variant checking. The need to check whether two constraint stores are variants 
of each other may arise at two occasions: 

— With no or only partial call abstraction (see Section 3.2) a constraint store 
is part of the call to the tabled predicate. The tabling system then needs to 
check whether a previous call with a variant of that constraint store appears 
in a table. If that is the case, the answer to the previous call can be reused. 

— A limited form of entailment checking (see Section 3.4) is to check whether 
a new answer constraint store is a variant of any previous answer constraint 
store for the same call. In that case the new answer can be discarded. 

We can consider this equality checking with the previously presented naive tabled 
representation of constraints. In that representation the tabled constraints are 
kept as a list of goals that impose the constraints. Any permutation of this list 
represents the same constraint store. If two constraint stores are identical modulo 
variable renaming, then they are variants. 

In general, variant checking of constraint stores has exponential complexity. A 
naive algorithm would be to consider all permutations of one constraint store. If 
any one of the permutations equals the other constraint store, both are identical. 
With heuristics this algorithm could be improved and for particular constraints 
or even applications algorithms with a better complexity may exist. However 
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further exploration falls outside of the scope of this paper. The problem can be 
ignored altogether, with possible duplication in tables as a consequence, or only 
partially tackled, e.g. by simple sorting and pattern matching. 



3.2 Call Abstraction 

Call abstraction replaces the called goal with a call to a more general goal fol- 
lowed by an operation that ensures that only the answer substitutions applica- 
ble to the particular call are retained. At the level of plain Prolog, abstraction 
means not passing certain bindings in to the call. E.g. p(q,A) can be abstracted 
to p(Q,A). This goal has then to be followed by Q = q to ensure that only the 
appropriate bindings for A are retained. 

In XSB call abstraction is a means to control the number of tables. When a 
predicate is called with many different instantiation patterns, a table is generated 
for each such call instantiation pattern. Thus it is possible that the information 
for the same fully instantiated call is present many times in tables for different 
call instantiation patterns. However, this duplication in the tables can be avoided 
by using call abstraction to restrict to a small set of call instantiation patterns. 

For constraint logic programming, call abstraction can be extended from 
bindings to constraints: abstraction means removing some of the constraints on 
the arguments. Consider for example the call p(Q,A) with constraint Q leq N 
on Q. This call can be abstracted to p(Q’ ,A), followed by Q’=Q to reintroduce 
the constraint. 

Abstraction is especially of value for those constraint solvers where the num- 
ber of constraints on a variable can be much larger than the number of different 
bindings for that variable. Consider for example a finite domain constraint solver 
with constraint domain/2, where the first argument is a variable and the second 
argument the set of its possible values. If the variable can be bound to at most 
n values it can take as many as 2 n different domain/2 constraints, one for each 
subset of values. Thus many different tables would be needed to cover every 
possible call pattern. 

Varying degrees of abstraction are possible and may depend on the particular 
constraint system or application. Full constraint abstraction, i.e. the removal of 
all constraints from the call, is generally more suitable for CHR for the following 
reasons: 

— CHR rules do not require constraints to be on variables. They can be on 
ground terms or atoms as well. It is not straightforward to define abstraction 
for ground terms as these are not necessarily passed in as arguments but can 
just as well be created inside the call. Hence there is no explicit link with the 
call environment, while such a link is needed for call abstraction. As such, 
only no abstraction or full constraint abstraction seem suitable for CHR. 

— Full constraint abstraction is preferable when the previously mentioned table 
blow-up is likely. 

— As mentioned in the previous section, variant checking of constraint stores 
can have exponential complexity. 
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Moreover, it may be quite costly for certain constraint domains to sort out 
what constraints should be passed in to the call or abstracted away, involving 
transitive closure computations of reachability through constraints. Hence often 
full abstraction is cheaper than partial abstraction. 

For CHR full abstraction requires the execution of the tabled predicate with 
an empty constraint store. If the call environment constraint store were used, 
interaction with new constraints would violate the assumption of full abstraction. 
The code below shows how a predicate p/ 1 that requires tabling: 
table p/1. 
p(X) :- ... 

is transformed into two predicates, where the first one is called, takes care of the 
abstraction, calls the second predicate and afterwards combines the answer with 
the previously abstracted away constraints. 
p(X) : - 

current_chr_store (CallStore) 
set_empty_chr_store , 
tabled_p(Xl , AnswerStore) , 
set_chr_store (CallStore) , 
insert_answer_store (AnswerStore) , 

XI = X. 

table tabled_p/2. 
tabled_p(X,S_A) ... 

The further implementation of tabled_p will be discussed in the next section. 
The given answer constraints are merged into the current global CHR constraint 
store by the predicate insert_answer_store/l. Given the naive representation 
discussed in the previous section, this boils down to calling a list of goals to 
impose the constraints. 

The above transformation is not involved at all and can be easily automated, 
hiding the internals of the CHR-tabling integration from the user. All the user 
needs to supply is the arguments that are CHR constraint variables. We propose 
the following declaration: 

:- table_chr f(_,chr) with Options. 

meaning that the predicate f /2 should be tabled, its first argument is an ordinary 
Prolog variable and its second argument is a CHR constraint variable of which 
all the constraints are abstracted away. 

3.3 Answer Projection 

Often one wants to project the answer constraint store on the non-local variables 
of the call. The usual motivation is that constraints on local variables are mean- 
ingless outside of the call. The constraint system should be complete so that no 
unsatisfiable constraints can be lost through projection. 

For tabling there is an additional and perhaps even more pressing motivation 
for projection: a predicate with an infinite number of different answers may be 
turned into one with just a finite number of answers by throwing away the 
constraints on local and unreachable variables. 




130 



Tom Schrijvers and David S. Warren 



In some cases it may suffice to look at the constraints in the store separately 
and given a set of non-local variables to decide whether to keep the constraint 
or not. In those cases it may be convenient to exploit the operational semantics 
of CHR and implement projection as a project/1 constraint with the list of 
variables on which to project as an argument. Simpagation rules can then be 
used to look at and decide what constraints to remove. A final simplification 
rule at the end can be used to remove the project/1 constraint from the store. 

The following example shows how to project away all leq/2 constraints that 
involve arguments not contained in a given set Vars: 

project(Vars) \ leq(X,Y) <=> \+ (member(X, Vars) ,member(Y, Vars)) I true, 
project (Vars) <=> true. 

Besides removal of constraints more sophisticated operations such as weakening 
are possible. E.g. consider a set solver with two constraints: in/2 that requires 
an element to be in a set and nonempty/ 1 that requires a set to be non-empty. 
The rules for projection could include the following weakening rule: 

project(Vars) \ in(Elem,Set) <=> 

member (Set, Vars) , \+ member (Elem, Vars) I nonempty (Set) . 

The predicate tabled_p would then look like: 

tabled_p(X,S_A) orig_p(X) . 

project ( [X] ) , 

extract_store_representation(S_A) . 

Here the predicate extract_store_representation/l converts from the or- 
dinary global store representation to the naive tabled store representation, dis- 
cussed in Section 3.1. 

This approach is of course not general in the sense that certain constraint 
domains may need more information than just the variables to project on, such 
as more intricate knowledge of the contents of the constraint store. In addition 
it relies on operational semantics and ordering of constraints. However, it is a 
rather compact and high level notation and as such it might be possible to infer 
conditions on its usage under which the technique is provably correct. 

The project/1 call could easily be added by the automatic transformation 
of the tabled predicate if the user supplies the projection option: 

table_chr p(chr) with [projection]. 



3.4 Entailment Checking and Other Answer Combinations 

In some cases some of the answers computed for a tabled predicate are redundant 
and so need not be saved. Indeed there are cases in which for any tabled call 
only one answer needs to be maintained. Consider for example that the answer 
p(a,X) is already in the table of predicate p/2. Now a new answer, p(a,b) is 
found. This new answer is redundant as it is covered by the more general p(a,X) 
that is already in the table. Hence it is logically valid to not record this answer 
in the table, but to simply discard it. This does not affect the soundness or 
completeness of the procedure. 
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The idea of this answer subsumption technique is to reduce the number of 
answers in the table by replacing two (or more) answers by a single answer. 
Logically, the single new answer has to be equivalent to the disjunction of the 
replaced answers. For ordinary tabled Prolog, each answer Hi can be seen as a 
Herbrand constraint Answer = Term, e.g. Answer = p(a, X). Now, for any two 
of these Herbrand constraints Hq and Hi the following two properties hold: 

1. If the disjunction is equivalent to another constraint, that constraint is equiv- 
alent to one of the two constraints. 

3H : H = H 0 V Hi <s=> 3* G {0, 1 }:H = Hi 

2. If the conjunction is equivalent to one of the two constraints, the disjunction 
is equivalent to the other. 



3i G {0, 1}:H 0 AH 1 = Hj H 0 V Hi = Hj_i 

These two properties suggest a possible strategy to compute the equivalent 
single answer of two answers: check whether the conjunction of two answers is 
equivalent to one of the two, then the other is the single equivalent answer. 
Otherwise, there is no single equivalent answer. 

We can include CHR constraints in the same logically sound idea of answer 
subsumption. This path length computation will serve as an illustration: 
dist(A,B,D) edge(A,B,Dl) , leq(Dl.D). 

dist(A,B,D) :- dist(A,C,Dl) , edge(C,B,D2) , leq(Dl + D2, D) . 

Suppose appropriate rules for the leq/2 constraint in the above program, 
where leq means less-tlran-or-equal. The semantics are that dist(A,B,D) holds 
if there is a path from A to B of length less than or equal to D. In other words, D 
is an upper bound on the length of a path from A to B. 

If there is answer dist (nl ,n2 ,D) :- leq(dl, D) already in the table and 
a new answer dist (nl ,n2 ,D) :- leq(d2, D), where dl =< d2, is found, then 
this new answer is redundant. Hence it can be discarded. Again this does not 
affect the soundness, since logically the same answers are covered. 

Operationally, the same strategy as proposed for ordinary Prolog can be used 
to reduce two answer constraint stores sq and si to a single answer store s. At 
the end of the tabled predicate we merge a previous answer store sq with a new 
answer store Si. After merging the store will be simplified and propagated to 
s by the available CHR rules. This combines the two answers into a new one. 
This mechanism can be used to check entailment of one of both answers by the 
other: if the combined answer store s is equal to one of the two, then that answer 
entails the other: so A Si = Si(i G {0, 1}) => Si_, = Sq V si. 

The predicate insert_answer_store/l, mentioned in Section 3.2, can be 
used for the conjunction of two constraint stores. We assume that one store is 
the current global CHR constraint store. 

When the above two answers of the dist/3 predicate are merged, the follow- 
ing rule leq/2 rule will simplify the constraint store to retain the more general 
answer: 



leq(X.Dl) \ leq(X,D2) <=> Dl =< D2 I true. 




132 



Tom Schrijvers and David S. Warren 



Note that the dist/3 program would normally generate an infinite number 
of answers for a cyclic graph, logically correct but not terminating. However, if it 
is tabled with answer subsumption, it does terminate for non-negative weights. 
Not only does it terminate, it only produces one answer, namely dist (nl ,n2 ,D) 

: - leq(d,D) with d the length of the shortest path. Indeed, the predicate only 
returns the optimal answer. 

The above strategy is a sound approach to finding a single constraint store 
that is equivalent to two others. However, it is not complete: a single constraint 
store may be equivalent to the disjunction of two others, while it is not equivalent 
to one of the two. This is because, the first property for the Herbrand constraints 
does not hold for all constraint solvers, e.g. leq(X , Y) V leq{Y , X) = true. Never- 
theless it is a rather convenient strategy, since it does not require any knowledge 
on the particularities of the used constraint solver. That makes it a good choice 
for the default strategy for CHR answer subsumption. Better strategies may be 
supplied for particular constraint solvers. 

For some applications one can combine answers with answer generalization 
which does not preserve the logical correctness. An example in regular Prolog 
would be to have two answers p(a,b) and p(a,c) and to replace the two of them 
with one answer p (a , X) . This guarantees (for positive programs) that no answers 
are lost, but it may introduce extraneous answers. A similar technique is possible 
with constrained answers. While this approach is logically unsound, it may be 
acceptable for some applications if the overall correctness of the program is not 
affected. An example is the use of the least upper bound operator to combine 
answers in the tabled abstract interpretation setting of [2], 

In summary, two additional options can be supplied to extend the automatic 
transformation: 

— canonicalHorm(PredA/ame) specifies the name of the predicate that should 
compute the (near) canonical form of the answer constraint store. This 
canonical form is used to check equivalence of two constraint stores. 

— answer_combination (PredName) specifies the name of the predicate that 
should compute the combination of two answers, if they can be combined. 
The value default selects the above mentioned default strategy. 

A subsumption-based optimization technique. The technique used in the dist/3 
program is to replace the computation of the exact distance of a path with 
the computation of an upper bound on the distance via constraints. Then, by 
tabling the predicate and performing answer subsumption, the defining predicate 
has effectively been turned into an optimizing one, computing the length of the 
shortest path. It is a straightforward yet powerful optimization technique that 
can be applied to other defining predicates as well, turning them into optimizing 
predicates with a minimum of changes. No meta-programming to iterate over all 
possible answers is required. 

3.5 Evaluation of a Shipment Problem 

Problem statement: There are N packages available for shipping using trucks. 
Each package has a weight and some constraints on the time to be delivered. Each 
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truck has a maximum load and a destination. Determine whether there is a subset 
of the packages that can fully load a truck destined for a certain place and all 
the packages in this subset are delivered on time. 

The problem is solved by the truckload program (see [17] for the source). 
Packages are represented by clauses of pack/4, e.g. 

pack(3 ,60, Chicago ,T) leq(4,T) ,leq(T,29) . 

means this is the third package, it weights 60 pounds, is destined for Chicago and 
has to be delivered between the 4th and the 29th day. The truckload/4 predi- 
cate computes the answer to the problem, e.g. truckload(30, 100, Chicago, T) 
computes whether a subset of the packages numbered 1 to 30 exists to fill up 
a truck with a maximum load of 100 pounds destined for Chicago. The time 
constraints are captured in the bound on the constraint variable T. There may 
be multiple answers to this query, if multiple subsets that satisfy it exist. 

We have run the program in four different modes: 

— Firstly, the program is run as is without tabling. 

— Secondly, to avoid the recomputation of subproblems in recursive calls the 
truckload/4 predicate is tabled with: 

table_chr truckload chr) with [representation (naive)] . 

— In a third variant the answer store is canonicalized by simple sorting such 
that permutations are detected to be identical answers: 

table_chr truckload chr) 

with [representation (naive) ,canonical_form(sort)] . 

— Finally, in the fourth variant we apply a custom combinator to the answers: 
two answers with overlapping time intervals are merged into one answer with 
the union of the time intervals. For example the disjunction of the following 
two intervals on the left is equivalent to the interval on the right: 

(1 < T < 3) V (2 < T < 4) (1 < T < 4) 

This variant is declared as, with interval_union/3 the custom answer com- 
binator: 

table_chr truckload (_,_,_, chr) 

with [representation (naive) ,answer_combination(interval_union)] . 



Table 3. Results for the truckload program 



load 


no tabling 
runtime 


runtime 


plain 

space 


answers 


runtime 


ablin^ 

sorted 

space 


answers 


COl 

runtime 


nbina 

space 


;or 

answers 


100 

200 

300 

400 

500 


0 

160 
2,461 
12,400 
> 5 min. 


100 

461 

1,039 

1,500 

1,541 


286 

979 

1,799 

2,308 

2,449 


324 

2,082 

4,721 

5,801 

4,972 


100 

461 

1,041 

1.510 

1,541 


286 

956 

1,723 

2,202 

2,365 


324 

2,069 

4,665 

5,751 

4,935 


100 

451 

971 

1,351 

1,451 


279 

904 

1,584 

2,054 

2,267 


283 

1,686 

3,543 

4,449 

4,017 



Table 3 contains the results of running the program in the four different 
modes for different maximum loads. Runtime is in milliseconds and has been 
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obtained on the same machine as in Section 2.1. For the modes with tabling the 
space usage, in kilobytes, of the tables and number of unique answers have been 
recorded as well. 

It is clear from the results that tabling does have an overhead for small 
loads, but that it scales much better. Both the modes with the canonical form 
and the answer combination have a slight space advantage over plain tabling 
which increases with the total number of answers. There is hardly any runtime 
effect for the canonical form, whereas the answer combination mode is faster 
with increasing load. 

In summary, tabling can be useful for certain programs with CHR constraints 
to considerably improve scalability. Canonicalization of the answer store and 
answer combination can have a favorable impact on both runtime and table 
space depending on the particular problem. 



4 Related and Future Work 

In this paper we have shown that it is possible to integrate the committed 
choice bottom-up execution of CHRs with the tabled top-down execution of 
XSB. In particular the issues related to the consistency of the global CHR store 
and tables have been established and solutions have been formulated for call 
abstraction, tabling constraint stores, answer projection, answer combination 
(e.g. for optimization), and answer entailment checking. 

Several ad hoc approaches to using constraints in XSB exist, such as a meta- 
interpreter [15], interfacing with a solver written in C [9] and explicit constraint 
store management in Prolog [16]. However, these approaches are quite cumber- 
some and lack the ease of use and generality of CHR. 

The main related work that this paper builds on is [4], which presents a 
framework for constraint solvers written with attributed variables. Attributed 
variables are a much cruder tool for writing constraint solvers though. Imple- 
mentation issues such as constraint store representation and scheduling strategies 
that are hidden by CHR become the users responsibility when he programs with 
attributed variables. Also in the tabled setting, the user has to think through 
all the integration issues of his attributed variables solver. For CHR we have 
provided generic solutions that work for all CHR constraint solvers and more 
powerful features can be accessed through parametrized options. 

Guo and Gupta propose a technique for dynamic programming with tabling 
([12]) that is somewhat similar to the one proposed here. During entailment 
checking they compare a particular argument in a new answer with the value 
in the previous answer and keep either one based on whether that argument 
needs to be minimized or maximized. Their technique is specified for particular 
numeric arguments whereas ours is for constraint stores. Further investigation 
of our proposal is certainly necessary to establish the extent of its applicability. 

In [18] we briefly discuss two applications of CHR with tabling in the field 
of model checking. The integration of CHR and XSB has shown to make the 
implementation of model checking applications with constraints a lot easier. 
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Indeed the next step in the search for applications is to explore more expressive 
models than are currently viable with traditional approaches. 

We still have to look at how to implement partial abstraction and the im- 
plications of variant and subsumption based tabling. Partial abstraction and 
subsumption are closely related. The former transforms a call into a more gen- 
eral call while the latter looks for answers to more general calls, but if none are 
available still executes the actual call. 

Finally, we would like to mention that an XSB release with the presented 
CHR system will soon be publicly available (see http: //xsb. sf .net). 
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Abstract. In this paper we consider a logic programming framework 
for reasoning about imprecise probabilities. In particular, we propose a 
new semantics, for the Probabilistic Logic Programs (p-programs) of Ng 
and Subrahmanian. P-programs represent imprecision using probability 
intervals. Our semantics, based on the possible worlds semantics, con- 
siders all point probability distributions that satisfy a given p-program. 
In the paper, we provide the exact characterization of such models of a 
p-program. We show that the set of models of a p-program cannot, in 
general case, be described by single intervals associated with atoms of 
the program. We provide algorithms for efficient construction of this set 
of models and study their complexity. 



1 Introduction 

Probabilities quantize our knowledge about possibilities. Imprecise probabilities 
represent our uncertainty about such quantization. They arise from incomplete 
data or from human unsureness. They occur in the analyses of survey responses, 
in the use of GIS data, and risk assessment, to cite a few application domains. 
The importance of imprecise probabilities has been observed by numerous re- 
searchers in the past 10-15 years [14,3] and lead to the establishment of the 
Imprecise Probabilities Project [6]. The appeal of standard, or point, probabil- 
ity theory is its clarity. Given the need for imprecision, there are many models: 
second-order distributions, belief states or lower envelopes, intervals, and oth- 
ers. Among them, probability intervals as the means of representing imprecision 
are the simplest extension of the traditional probability models. Even then, a 
variety of different explanations of what it means for a probability of an event 
e to be expressed as an interval [a, b] C [0,1] have been proposed in the past 
decade and a half [14,2,15,3]. Among them, the possible worlds approach in- 
troduced for probability distributions by De Campos, Huete and Moral [3] and 
extended to Kolmogorov probability theory by Weichselberger [15] is, probably, 
the most appealing from the point of view of the origins of imprecision. Accord- 
ing to [3,15], there is a single true point probability distribution underlying a 
collection of random variables or events. The imprecision, expressed in terms of 
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probability intervals stems from our inability to establish what this true distri- 
bution is. Thus, interval probabilities are constraints on the set of possible point 
probability distributions that can be true. 

Previous logic programming frameworks addressing the issue of imprecision 
in known probabilities [8, 7, 9, 4, 5] had taken similar approaches to interpreting 
probability intervals, but have stopped short of considering precise descriptions 
of sets of point probabilities as the semantics of probabilistic logic programs. In 
this paper, we seek to extend one such logic programming framework, Proba- 
bilistic Logic Programs (p-programs), introduced by Ng and Subrahmanian[9], 
to capture the exact set of point probabilities that satisfy a p-program. 

The main contributions of this paper are as follows. First, we describe the 
possible worlds semantics for a simple logic programming language in which 
probability intervals are associated with each atom in a program (simple p- 
programs) (Section 2). The syntax of the language and its model theory are 
from [9], however, we show that the fixpoint semantics described there does not 
capture precisely the set of all point probability models of a program (Section 
2.2). We then proceed to describe this set of models formally (Section 3.1), and 
provide an explicit construction for it (Section 3.2), complete with algorithms for 
implementing this constructions. We show that associating single intervals with 
atoms of p-programs is not sufficient to capture their model-theoretic semantics: 
one has to consider unions of open, closed and semi-closed intervals. We also 
show that while the size of such description of the set of models of a simple 
p-program can be, in the worst case exponential in the size of the program, our 
algorithm GenModT for its construction, works in an efficient manner. 

2 Probabilistic Logic Programs 

In this section we describe a simplified version of the Probabilistic Logic Pro- 
grams of Ng and Subrahmanian [9] . Let L be some first order language containing 
infinitely many variable symbols, finitely many predicate symbols and no func- 
tion symbols. Let Bl = {Ai, . . . , An} be the Herbrand base of L. A p-annotated 
atom, is an expression A : p where A G B L , and p = [a, (3\ C [0, 1]. 

P-annotated atoms represent probabilistic information. Every atom in Bl 
is assumed to represent an (uncertain) event or statement. A p-annotated atom 
A : [a, j3\ is read as “the probability of the event corresponding to A to occur (have 
occurred) lies in the interval [«,/?]”. Probabilistic Logic Programs (p-programs) 
are constructed from p-annotated formulas as follows. Let A,Ai,... A n be some 
atoms and p, pi, . . . , p n be subintervals of [0, 1] (also called annotations). Then, 
a simple p-clause is an expression of the form A : p < — A\ : pi A ... A A n : p n 
(if n = 0, as usual, the p-clause A : p < — is referred to as a fact). A simple 
Probabilistic Logic Program ( p-program ) is a finite collection of simple p-clauses. 

2.1 Model Theory and Fixpoint Semantics 

The model theory assumes that in real world each atom from Bl is either true 
or false. However, the observer does not have exact information about the real 
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world, and expresses his/her uncertainty in a form of a probability range. Given 
Bl, a world probability density function KI is defined as KI : 2 Bl — » [0,1], 
J2 w <zb l Kim = 1- Each subset W of Bl is considered to be a possible 
world and KI associates a point probability with it. A probabilistic interpre- 
tation ( p-interpretation ) I is defined on Bl as follows: / : Bl —■ ► [0,1], 1(A) = 
J^Aew KI(W). P-interpretations assign probabilities to individual atoms of Bl 
by adding up the probabilities of all worlds in which a given atom is true. P- 
interpretations specify the model-theoretic semantics of p-programs. Given a 
p-interpretation I , the following definitions of satisfaction are given: 

— I \= A : p iff 1(A) £ p: 

— I \= A\ : pi A . . . A A n : p n iff (VI < i < n)(I \= A t : /z*); 

— I \= A : p < — A\ : p\ A ... A A n : p n iff either I \= A : p or / [A A± : 
Pi A ... A A n . Pn- 

Now, given a p-program P, I |= P iff for all p-clauses C £ P I \= C. Let 
Mod(P) denote the set of all p-interpretations that satisfy p-program P. It is 
convenient to view a single p-interpretation / as a point (J(Ai), . . . ,I(A^)) in 
W-dimensional unit cube E N . Then, Mod(P) can be viewed as a subset of E N . 
P is called consistent iff Mod(P) yA 0P, otherwise P is called inconsistent. 

Fixpoint semantics of simple p-programs is defined in terms of functions that 
assign intervals of probability values to atoms of Bl- An atomic function is a 
mapping / : Bl — * C[0, 1] where C[0, 1] denotes the set of all closed subintervals 
of [0,1]. Generally, an atomic function / describes a closed parallelepiped in TV- 
dimensional space: a family of p-interpretations 1(f) = {/|(VA £ B l )(I(A) £ 
f(A))j is associated. 

The set of all atomic functions over Bl forms a complete lattice TT w.r.t. 
the subset inclusion: fi < / 2 iff (VA £ Pl)(/i(A) D f 2 (A)). The bottom element 
JL of this lattice is the atomic function that assigns [0, 1] interval to all atoms, 
and the top element T is the atomic function that assigns 0 to all atoms. 

Given a simple p-program P the fixpoint operator Tp : TT — > TT is 
defined as Tp(f)(A) = n Ma, where Ma = {p\A : p < — Bi : pi A . . . A B n : 
p n £ P and (VI < i < n)(f(Bi) C p,)} . Ng and Subrahmanian show that this 
operator is monotonic [9]. The iterations of Tp are defined in a standard way: 
(i) Tp = _L; (ii) Tp +1 = Tp(Tp), where a + 1 is the successor ordinal whose 
immediate predecessor is a; (iii) Tp = U{Pp|a < A}, where A is a limit ordinal. 
Ng and Subrahmanian show that, the least fixpoint lfp(Tp) of the Tp operator is 
reachable after a finite number of iterations ( [9] , Lemma 4) . They also show that 
if a p-program P is consistent, then T(lfp(Tp)) contains Mod(P) ([9] Theorem 
5, Claim (i)). 

2.2 Fixpoint Is Not Enough 

At the same time, the inverse of the last statement, is not true, as evidenced by 
the following examples. First, consider p-program Pi shown in Figure 1. 

Proposition 1. There exists a p-interpretation I s. t. I £ T(lfp(Tpf)) but I [A 

Pi- 
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a 


[0.2, 0.4] <■ — . 




(1) 


b 


[0.3, 0.5] < — . 




(2) 


b 


[0.6, 0.7] <■ — a 


[0.2, 0.3]. 


( 3 ) 


b 


[0.6, 0.7] < — a 


[0.3, 0.4], 


( 4 ) 



a 


[0.2, 0.4] <■ — . 




(1) 


b 


[0.2, 0.5] < — . 




(2) 


b 


[0.2, 0.3] <■ — a 


[0.2, 0.3]. 


( 3 ) 


b 


[0.4, 0.5] < — a 


[0.3, 0.4], 


( 4 ) 



Program P\ Program P 2 

Fig. 1. Sample P-programs. 

Proof. It is easy to see that neither rule (3) nor rule (4) will fire during the com- 
putation of the least fixpoint. Indeed, Tp_^(a) = [0.2, 0.4] and T^(b) = [0.2, 0.5] 
based on clauses (1) and (2). However, at the next step, as [0.2, 0.4] % [0.2, 0.3], 
rule (3) will not fire and as [0.2, 0.4] % [0.3, 0.4], rule (4) will not fire. Therefore, 
lfp(T Pl ) = T^. 

Now, consider p-interpretation /, such that 1(a) = 0.2 and 1(b) = 0.35. 
Clearly, / G ^(//^(TpJ). However, / ^ Pi. Indeed, as 1(a) = 0.2 G [0.2, 0.3], I 
satisfies the body of rule (3). Then / must satisfy its head, i.e. , 1(b) G [0.2, 0.3]. 
However, 1(b) = 0.35 ^ [0.4, 0.5], and therefore rule (3) is not satisfied by I. ■ 
We note that the fixpoint of Pi is defined but it is not tight enough to 
represent exactly the set of satisfying p-interpretations. It is also possible for 
a p-program to have a well-defined fixpoint but be inconsistent. Consider p- 
program P 2 from Figure 1. 

Proposition 2. 1. lfp(Tp 2 ) = Tp 2 . In particular, lfp(Tp 2 )(a) = [0.2, 0.4] 

and lfp(Tp 2 )(b) = [0.3, 0.5]. 

2. Mod(P 2 ) = 0 

Proof. The first part is similar to the proof of Proposition 1. To show that 
AIod(P 2 ) = 0 consider some p-interpretation I such that I \= P 2 . Let 1(a) = p. 
As p G lfp(Tp 2 )(a) = [0.2, 0.4] then p G [0.2, 0.3], or p G [0.3, 0.4]. In either case, 
the body of at least one of the rules (3), (4) will be satisfied by / and therefore, 
1(b) G [0.6, 0.7]. However, we know that 1(b) G lfp(Tp 2 )(b) = [0.3, 0.5], which 
leads to a contradiction. ■ 

Note that the lfp(T P ) specifies the semantics of a p-program as the set 
of p-interpretations inside a single N-dimensional parallelepiped whose borders 
are defined by lfp(T P )(Ai ), . . . , lfp(Tp)(A]y). Unfortunately, this is not always 
the case, i.e., Mod(P) need not be a single TV-dimensional parallelepiped, as 
evidenced by the following proposition. 

Proposition 3. If the atoms in Bp for Pi (Figure 1) are ordered as a,b, then 
Mod(P 3 )= [0.2, 0.3) x [0.2, 0.3] U (0.3, 0.4] x [0.4, 0.5], 

Proof. First, we show that Mod(Pi) C [0.2, 0.3) x [0.2, 0.3] U (0.3, 0.4] x [0.4, 0.5]. 

Let I \= Pi . As lfp(Tp(Pi))(A) = [0.2, 0.4] (by rule (1)), three cases are 
possible. 

1. 1(A) G [0.2, 0.3). Consider rules (3) and (4). As 1(A) G [0.2, 0.3), the body 
of rule (3) will be true, and the body of rule (4) will be false. Thus, / must 
satisfy the head of (3), i.e., 1(B) G [0.2, 0.3]. Therefore I G [0.2, 0.3) x 
[0.2, 0.3]. 
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2. 1(A) = 0.3. In this case, the bodies of both rule (3) and rule (4) are satisfied, 

and therefore / must satisfy both heads of these rules, i.e., 1(B) G [0.2, 0.3] 
and 1(B) G [0.4, 0.5]. But as [0.2, 0.3] fl [0.4, 0.5] = 0, we arrive to a contra- 
diction. Therefore, for any p-interpretation I \= Pi, 1(A) 0.3. 

3. 1(A) G (0.3, 0.4]. Here, the body of rule (3) will not be true, but the body 
of rule (4) will, therefore, I must satisfy the head of rule (4), i.e., 1(B) G 
[0.4, 0.5]. Then, I G (0.3, 0.4] x [0.4, 0.5]. 

Combining the results of all three cases together we get / G [0.2, 0.3) x 
[0.2, 0.3] U (0.3, 0.4] x [0.4, 0.5], which proves the inclusion. It is easy to verify 
that any I G [0.2, 0.3) x [0.2, 0.3] U (0.3, 0.4] x [0.4, 0.5], is the model of Pi. ■ 
We note, here, that, in general, the problem of determining if a simple p- 
program P is consistent is hard. We define the set CONS-P= {P|Mod(P) yf 0}. 

Theorem 1. The set CONS-P is NP-complete. 

3 Semantics of PLPs Revisited 

As shown in Section 2.2, even for the simplest p-programs, the set of their models 
may have a more complex structure than the one prescribed by the hxpoint 
procedure of [9]. In this section we study the problem of exact description and 
explicit computation of Mod(P) given program P. We show that in general 
case, Mod(P) is a union of a finite number of TV-dimensional 1 open, closed, or 
semi-closed parallelepipeds within the TV-dimensional unit lrypercube [0,1]^. In 
Section 3.1 we characterize Mod(P) as the set of solutions of a family of systems 
of inequalities constructed from P. In Section 3.2 we propose a way of computing 
M od(P) using special transformations of P. 

3.1 Characterization of Models 

Definition 1. Let P be a simple p-program over the Herbrand base 
Bl = {A\, . . . , An}. With each atom A G Pl we will associate a real variable 
xa with domain [0,1]. Let C = A : [ l,u\ < — B i : [Ti,tti] A ... A Bk : [lk,Uk\> 
k > 0 be a clause of P. 

The family of systems of inequalities induced by C, denoted INEQ(C) is 
defined as follows: 

— k = 0 (C is a fact). INEQ(C) = {{T < xa < w}} 

— k > 1 (C is a rule). 

INEQ(C) = T(C) U F(C)\ 

T(C) = {{l < xa <u, It < x Bi < Ui\l < i < fc}}; 

F(C) = {{xBi < k}\l < * < k} U {{xBi > Uj}|l < i < k}. 

The family INEQ(P) of systems of inequalities is defined as 
INEQ(P) ={«iU...Ua m | ai G INEQ(Ci), 1 <i<k}. 

1 Whenever we are writing about TV-dimensional parallelepipeds representing the set 
of models of a p-program, we implicitly assume the possibility that the true dimen- 
sionality of some of them can be less than N due to the fact that with certain atoms 
of Bl exact point probabilities, rather than intervals may be associated. 




142 



Alex Dekhtyar and Michael I. Dekhtyar 



Note that all inequalities in all systems from the definition above involve 
only one variable. Given a system a of such inequalities, we denote the set of 
its solutions as Sol (a). For A G Bl let Iff = max{0 U {1\(xa < l) € a}} and 
u A = min{l U {u\xa > u € a-}}. Then it it easy to see that 

„ 11 \ — f ® if for some A , Iff > vff\ 

° “ l X • • • x K N ^A N ] Otherwise. 

Informally, the set INEQ(P) represents all possible systems of restrictions 
on probabilities of atoms of Bl whose solutions satisfy every clause of P. Of 
course, not all individual systems of inequalities have solutions, but INEQ(P) 
captures all the systems that do , as shown in the following lemma and theorem. 

Lemma 1. Let C he a p-clause and I be a p — interpretation (both over the 
same Herbrand Base Bl)- Then I \= C iff {xa = /(A)} € Sol(a) for some 
a e INEQ(C). 

Theorem 2. A p-interpretation I is a model of a simple p-program P iff there 
exists a system of inequalities a € INEQ(P) such that X = {xa = /(A)} € 
Sol(a). 

This leads to the following description of Mod(P): 

Corollary 1. Mod(P) = U a& iNEQ(P) Sol{a) 

We denote as Facts(P) and Rules(P) the sets of p-clauses with empty and 
non-empty bodies in a p-program P, and as f(P) and r(P) - their respective 
sizes. Let also k(P) be the maximum number of atoms in a body of a rule in P. 
Then, we can obtain the following bound on the size of Mod(P). 

Corollary 2. The set of all p-interpretations I that satisfy a simple p-program 
P is a union of at most M(P) (not necessarily disjoint) N-dimensional paral- 
lelepipeds, where M(P) = ( 2fc(P ) + l) r ( p ). 

This Corollary provides an exponential, in the size of the p-program, upper 
bound on the number of disjoint parallelepipeds in the set Alod(P). We can show 
that this bound cannot be substantially decreased in the general case. Consider 
p-program P3 over the set of atoms {a, bi , ... , b n }: 

a: [1,1] — - (1) 

bi : [0, 1] < — • i = 1, • • • ,n (2i) 

a : [0, 0] < — bi : [0.2, 0.3], i = l,...,n (3i) 

Here, INEQ(T) consists of a single equality x a = 1; each of INEQ(2i) 
includes trivial inequalities 0 < xt H < 1, and each of INEQ(2>i ) consists of 
three systems of inequalities: a) = {0 < x b t < 0.2}, a) = {0.3 < Xb i < 1}, 
and a) = {0.2 < Xb i < 0.3; x a = 0}. Since a) is inconsistent with INEQ( 1), 
each consistent set of inequalities in INEQ(P 3 ) can be represented as {x a = 
1 } U U"=i (x> i f° r some ji G {1, 2}, * = 1 , ,n. It is easy to see that for any 
two different a and a' of such form in INEQ(Pff) sets Sol (a) and Sol(a') are 
disjoint. So, Mod(Ps) consists of 2 n disjoint n-dimensional parallelepipeds. At 
the same time /(P3) = n+ 1, r(Ps) = n, k(Pff = 1 and a bitwise representation 
of P 3 takes only 0(n log n) bits. 
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3.2 Explicit Computation of Models 

In this section we will address the following problem: given a simple p-program 
P, output the description of the set Mod(P) as a union of N-dimensional paral- 
lelepipeds. 

The construction from previous section gives one algorithm for computing 
Mod(P): given a program P construct explicitly the set of systems of inequal- 
ities INEQ(P) and then solve each system from this set. This algorithm has 
exponential worst case complexity in the size of the program and as program P 3 
illustrates the worst case cannot be avoided. However, it not hard to see that the 
algorithm based on solving individual systems of inequalities from INEQ(P) can 
be quite inefficient in its work. Indeed, as the solution sets of individual systems 
of inequalities are not necessarily disjoint, this algorithm may wind up comput- 
ing parts of the final solution over and over. In this section, we propose a different 
approach to direct computation of the set of models of a simple p-program, which 
breaks the solution space into disjoint components and individually computes 
each such component. 

Consider a simple p-program P over the Her brand base Bl = {Hi, . . . An}- 
As AT(P) we denote the multiset of all p-annotated atoms found in all heads 
and bodies of clauses in P. Given A £ Bl Let AT(P)[A] be the set of all p- 
annotated atoms of the form A : p from AT(P). Define for each A £ Bl a set 
Prbp(Ai) of all possible bounds of probability intervals used in P for A t as follows 
Prbp(A) = {</,— )|H : [l,u] £ AT(P)(A]} U {(u, +)\A : [l,u] £ AT(P)(A]}U{< 

0, — >,< 1,+ >}. Thus with each occurrence of a probability bound for A in 
P, we are also storing (encoded as ” or “+”) whether it is a lower or upper 
bound. 

We order the elements of Prbp(A) as follows, (a, *) < ( b , *) whenever a < b, 
and (a,—) < (a, +). Consider now Prb P (A) = {/A =< 0,— >, P 2 , ■ ■ ■ , Pm =< 

1, + >} where sequence pi, ... , p m is in ascending order. Using the set Prbp(A) 
we will now construct the set of segments SEGp(A) as follows. 

Let Pi = ( di,\i ) and pi + 1 = (ai+i,Ai+i), 1 < * < m — 1. We define the 
segment Si associated with the pair Pi, P i+ i as shown in the table on the left 
side of Figure 2. Now, SEGp(A) = {si, S 2 , ■ ■ ■ , s m -i}. 

Notice that if a* = ai + 1 then, A i is a ” and A^+i is a “+” (it follows from 
our order on /3js) and the interval [ a.i , ai+f\ = [ai, a,] will be added to SEGp(A). 
The following proposition establishes basic properties of the segment sets. 

Proposition 4. Let P be a simple p-program, A £ Bl and 
SEGp(A) = {si, . . . , 

1. SEGp(A) is a partition of [0, 1] , in particular,, if i yf j then Si fl Sj = 0. 

2. Consider some 1 < i < m — 1. Let x,y € s^ and let I\ and I 2 be p- 
interpretations such that Ii(A) = x and 12 (A) = y. Then for all A : p £ 
AT(P)[A], h^A:piffI 2 ^A:p. 

3. Consider some 1 < i < m — 2. Let x £ Si and y € s^+i and let I\ and I 2 be 
p-interpretations such that I\(A) = x and 12 (A) = y. Then 

{A: p. £ AT(P)[A] | h b A : y} ± {A : p £ AT(P)(A } \ I 2 \= A : p}. 
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Si 


- + 
+ - 
+ + 


[Cii 5 ) 

[fli, di+l\ 
(cn,ai+ 1 ) 



(1) Compute SEG(P). 

(2) for each J G SEG(P) do 

(3) Choose some interpretation (point) / E J; 

(4) if I {= P then add J to Mod(P) end if 

(5) end do 



Fig. 2. Determination of segments in SEG(A) (left) and algorithm Gen Mod for com- 
puting Mod(P). (right). 



Given a simple p-program P over the Herbrand base Bl = {Ai, . . . , Ajv}, the 
segmentation of P, denoted SEG(P) is defined as follows 

SEG(P) = {s 1 x s 2 x . . . x s N \ s j £ SEGp(Aj), 1 <j< N}. 

Basically, SEG(P) is a segmentation of the N-dimensional unit hypercube 
into a number of “bricks” . Recall that each point inside the N-dimensional unit 
hypercube represents a p-interpretation. The following theorem shows that the 
set of all p-interpretations satisfying P can be constructed from some “bricks” 
of SEG(P). 

Theorem 3. 1. Any two different parallelepipeds of SEG(P) do not intersect. 

2. For any parallelepiped J G SEG(P) either J C Mod(P), or JnMod(P) = 0. 

3. There exists such subset S C SEG(P) that Mod(P) = U./eS ^ ■ 

Consider again program Pi (Fig. 1). Atom a has the set of probability bounds 
Prbpffa) = {(0, -), (0.2, -), (0.3, -), (0.3,+), (0.4, +), (1,+)} and atom b has 
the set of bounds Prbpffb) = {(0, -), (0.2, -), (0.3, +), (0.4, -), (0.5, +), (1,+)}. 
The corresponding sets of the segments are 
SEGpffa) ={[0,0.2], [0.2, 0.3), [0.3, 0.3], (0.3, 0.4], (0.4, 1]} and 
SEG Pl (b) = {[0, 0.2), [0.2, 0.3], (0.3, 0.4), [0.4, 0.5], (0.5, 1]}. 

Then SEG(Pi) consists of 25 rectangles of the form s 1 x s 2 where s 1 G SEGp 3 (a) 
and s 2 G SEGpffb) (in fact, 5 of them with s 1 = [0.3, 0.3] are linear segments). 
As is shown in Proposition 3 only 2 of them consist of models of P 3 : Mod(Pi) = 
[0.2, 0.3) x [0.2, 0.3] U (0.3, 0.4] x [0.4, 0.5]. 

Theorem 3 suggests that Mod{P) can be constructed using the algorithm 
GenMod described in Figure 2. We note that steps (3) and (4) of this algorithm 
can be processed efficiently. In particular, if J = s 1 x ... x s N and each s l is a 
segment with the lower bound l l and the upper bound u l , i = 1, . . . , N, then for 
each i the value /(A;) on step (3) can be chosen to be equal to (T + u l )/2. So, 
the runtime of GenMod is bounded by a polynomial of the size of SEG(P). The 
size of SEG(P) is, in its turn, exponential of the size of the set Bp of all atoms 
of P. Of course, it can be a case when some “bricks” in SEG(P) can be united 
into one larger “brick”, so that Mod{P) is represented by a smaller number 
of bricks than SEG(P). But the program P 3 shows that in the general case 
even minimal number of “non-unitable” bricks in Mod(P) can be exponential in 
\B l \. Therefore, the worst case running time of algorithm GenMod can not be 
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Algorithm GenModT(P:program, {Ai, . . . , A n }:atoms) 
if n = 1 and P = Uj =1 {Ai : fij <— .} then Sol := W 
else // P includes at least two different atoms 
Sol := 0; 

S := NS(P); // compute Ng-Subrahmanian Tp operator 
if S = 0 then return(0) 

else// if NS(P) is not empty, proceed with computations 
// reduce P wrt S = X™ =1 sgi 

for i = 1 to n do P := Reduct(P,Ai : sgi) end_do 
Seg := SEG(P, Ai) D sgi; // the segmentation of Ai inside 
// the Tp operator 

// main loop 

for each s = (a, b) £ Seg do 
P' := Reduct(P,Ai : s); 

if P' is empty then Sol := Sol U(sx (x” =2 [0, 1])) 
else// find the solution for the reduct 
RSol := GenModT{P', {A 2 , A n })- 
if RSol ^ 0 then Sol := Sol U (s x RSol) end if 

end if end do 
end if end if 

return Sol; 



Fig. 3. Algorithm GenModT for computing Mod(P). 



improved. At the same time, we can improve on GenMod , by being more careful 
at how the N-dimensional “bricks” are considered. 

We fix an ordering A\,... 1 A^ of B l- Given a simple p-program P, let 
lfp(Tp(Ai)) = sg.i and NS(P) = xA 1 s^ i . From [9] we know that Mod(P) £ 
NS(P). We observe, that it is sufficient, to segment NS(P) rather than the unit 
N-dimensional lrypercube to compute Mod(P). For a set of segments S and a 
segment p let us denote by S' n g the set {s|s £ S and s C /j,}. 

Given a simple p-program P , an atom A £ Bp and an interval v C [0, 1], we 
denote by Reduct(P 1 A : v) a reduced program which results from P as follows: 

(i) Delete from P any clause C with the head A : g such that v C g. 

(ii) Delete from P any clause C whose body includes an atom A : p such that 
HC\u = 0 . 

(ii) Delete from the body of any other rule each atom A : g such that v C p. 

It is easy to see that Mod(Reduct(P, A: i/)) = Mod(P U {A : v <— .}). 

Figure 3 contains the pseudocode for the algorithm GenModT, designed to 
intelligently execute all steps of the algorithm GenMod. The algorithm works 
as follows. On the first step, we compute NS(P), reduce P wrt NS(P) and 
construct segmentation of A\. Then for each segment, we construct a reduced 
program P' and recursively run GenModT on P' and set {A 2 , . . . , A n } of atoms, 
and combine the solution returned by the recursive call with the segment of A\ 
for which it was obtained. The union of solutions computed this way is returned 





146 



Alex Dekhtyar and Michael I. Dekhtyar 



at the end of each call to GenModT. The stopping conditions are either an empty 
reduct program, meaning that the segmentation leading to this reduct yields a 
part of the final solution, or a contradiction during the computation of NS(P), 
meaning that current segmentation does not yield models of P. The theorem 
below states that Algorithm GenModT is correct. 

Theorem 4. Given a simple p-program P and an ordering Ai, . . . , of B p , 
algorithm GenModT returns the set Mod(P). 

Apart from using NS(.) as starting points for segmentation on every step, 
Algorithm GenModT improves over a naive implementation of Gen Mod in two 
ways. First, it may turn out that one of the stopping conditions for GenModT 
holds before the recursion has exhausted all atoms from P. In this case, it means 
that either an entire sub-space is part of the solution or is not part of the 
solution, but we no longer need to check each “brick” inside that sub-space. 
Second, on each step of the recursion after the first one, segmentation of the 
current atom occurs with respect to the current program, which is a reduct of 
P w.r.t. all previously considered atoms. This reduct has a simpler structure, 
and, in many cases, would have fewer and shorter rules. This means that the 
segmentation of the current atom w.r.t. the reduct may contain fewer segments 
than the segmentation w.r.t. original program P. Another convenient feature 
ofGenModT is that it structures Alod(P) in a form of a tree, corresponding to 
the way it recursively enumerates the solutions. 

The advantages of GenModT over naive implementation of Gen Mod are 
demonstrated in the example of program Pi (Fig. 1). It was shown that NS(Pi)= 
[0.2, 0.4] x [0.2, 0.5] and that 

SEG Pl (a) ={[0,0.2], [0.2, 0.3), [0.3, 0.3], (0.3, 0.4], (0.4, 1]} and 
SEG Pl (6) = {[0, 0.2), [0.2, 0.3], (0.3, 0.4), [0.4, 0.5], (0.5, 1]}. 

So, at the first step of GenModT Seg = SEG Pl {a) IT (0.2, 0.4] = {[0.2, 0.3), [0.3, 
0.3], (0.3, 0.4]} and the main loop will proceed three times as follows: 

1) s = [0.2, 0.3), P' = {b: [0.2, 0.3] < — .}, Sol = {[0.2, 0.3) x [0.2, 0.3]}; 

2) s= [0.3, 0.3], P' = {b: [0.2, 0.3] * — .; b : [0.4, 0.5] * — .}, Sol := Sol U 0; 

3) s = (0.3, 0.4], P' = { b : [0.4, 0.5] * — .}, Sol := Sol U {(0.3, 0.4] x [0.4, 0.5]}. 

The result will be Sol = [0.2, 0.3) x [0.2, 0.3]U(0.3, 0.4] x [0.4, 0.5] which is equal to 
Mod(Pi) (see Proposition 3). Thus, GenModT tries only 3 bricks while GenMod 
will check all 25 bricks. 

4 Related Work and Conclusions 

There has been a number of logic programming frameworks for uncertainty pro- 
posed in the past 15 years (see [4] for a detailed survey), most concentrating on 
point probabilities. The work of Poole [13] and Ngo and Haddawy [12] treated the 
“< — ” as conditional dependence and used logic programming to model Bayesian 
Networks. In more recent work, Baral et al. [1] present an elegant way to incorpo- 
rate probabilistic reasoning into an answer set programming framework, in which 




Possible Worlds Semantics for Probabilistic Logic Programs 



147 



they combine probabilistic reasoning with traditional non-monotonic reasoning. 
At the same time, some work [8-11,4,5] looked at interval probabilities as the 
means of expressing imprecision in probability assessment, tics of the original 
In all those frameworks, the underlying semantics allowed for expression of the 
possible probability of an atom in a program as a single closed interval. Our work 
is the first to consider a harder problem of describing the semantics of interval- 
based probabilistic logic programs with sets of point probability assessments 
(p-interpretations), based on the semantics of interval probabilities proposed by 
De Campos et. al [3] and Weiclrselberger[15]. As shown in this paper, even for 
fairly simple syntax, such descriptions become more complex than single inter- 
vals and their computation is much more strenuous. Our next step is to study 
our semantics in the full language of p-programs of [9] and hybrid probabilistic 
programs [4]. We are also interested in investigating the relationship between 
the p-programs with possible worlds semantics and constraint logic programs. 



References 

1. Chitta Baral, Michael Gelfond, J. Nelson Rushton. (2004) Probabilistic Reasoning 
With Answer Sets, in Proc. LPNMR-2004, pp. 21-33. 

2. V. Biazzo, A. Gilio. (1999) A Generalization of the Fundamental Theorem of de 
Finetti for Imprecise Conditional Probability Assessments, Proc. 1st. Inti. Sym- 
posium on Imprecise Probabilities and Their Applications. 

3. Luis M. de Campos, Juan F. Huete, Serafin Moral (1994). Probability Intervals: 
A Tool for Uncertain Reasoning, International Journal of Uncertainty, Fuzziness 
and Knowledge-Based Systems (IJUFKS), Vol. 2(2), pp. 167 - 196. 

4. A. Dekhtyar and V.S. Subrahmanian (2000) Hybrid Probabilistic Programs. Jour- 
nal of Logic Programming, Volume 43, Issue 3, pp. 187 - 250 . 

5. M.I.. Dekhtyar, A. Dekhtyar and V.S. Subrahmanian (1999) Hybrid Probabilistic 
Programs: Algorithms and Complexity in Proc. of 1999 Conf. on Uncertainty in 
AI (UAI), pp 160 - 169. 

6. H.E. Kyburg Jr. (1998) Interval- valued Probabilities, in G. de Cooman, P. Walley 
and F.G. Cozman (Eds.), Imprecise Probabilities Project, 

http:/ /ippserv. rug.ac.be/documentation/interval_prob/intervaLprob.html. 

7. V.S. Lakshmanan and F. Sadri. (1994) Modeling Uncertainty in Deductive 
Databases, Proc. Int. Conf. on Database Expert Systems and Applications, 
(DEXA’94), September 7-9, 1994, Athens, Greece, Lecture Notes in Computer 
Science, Vol. 856, Springer (1994), pp. 724-733. 

8. V.S. Lakshmanan and F. Sadri. (1994) Probabilistic Deductive Databases, Proc. 
Int. Logic Programming Symp., (ILPS’94), November 1994, Ithaca, NY, MIT 
Press. 

9. R. Ng and V.S. Subrahmanian. (1993) Probabilistic Logic Programming, Infor- 
mation and Computation, 101, 2, pps 150-201, 1993. 

10. R. Ng and V.S. Subrahmanian. A Semantical Framework for Supporting Subjec- 
tive and Conditional Probabilities in Deductive Databases, Journal of Auto- 
mated Reasoning, 10, 2, pps 191-235, 1993. 

11. R. Ng and V.S. Subrahmanian. (1995) Stable Semantics for Probabilistic Deduc- 
tive Databases, Information and Computation, 110, 1, pps 42-83. 




148 



Alex Dekhtyar and Michael I. Dekhtyar 



12. L. Ngo, P. Haddawy (1995) Probabilistic Logic Programming and Bayesian Net- 
works, in Proc. ASIAN-1995, pp. 286-300. 

13. D. Poole (1993). Probabilistic Horn Abduction and Bayesian Networks. Artificial 
Intelligence, Vol. 64(1), pp. 81-129. 

14. Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. Chapman 
and Hall, 1991. 

15. Weichselberger, K. (1999). The theory of interval-probability as a unifying con- 
cept for uncertainty. Proc. 1st International Symp. on Imprecise Probabilities and 
Their Applications. 




Limiting Resolution: 

From Foundations to Implementation 



Patrick Caldon and Eric Martin 

The University of New South Wales, Sydney, 2052, Australia 

pate , emartin@cse .unsw.edu.au 



Abstract. We present a generalization of SLD-resolution, Limiting Res- 
olution (LR) which embeds concepts from the held of inductive infer- 
ence into logic programming. This paper describes the development of 
LR from theoretical underpinnings through to demonstrating a practical 
implementation. LR is designed to represent and solve problems which 
are not purely deductive more easily than current logic programming 
formalisms. It is based on the notion of identification in the limit, where 
successful computations produce a non-halting converging sequence of 
outputs as opposed to computations which produce a single output and 
halt. The queries of LR are of the form 3 x(tl>(x) i/)), with some 

restrictions on the occurrence of negation in ?/> and y. The programs are 
divided into background knowledge and a potentially infinite stream of 
data which drives the construction of the converging sequence of outputs. 
In some problems true negation can be applied to data in this stream. We 
describe the logical foundations of LR, where the notions of induction, 
deduction and identification in the limit are unified in a common frame- 
work. The programs, queries, and proof procedure of LR are precisely 
defined, and a completeness result is stated. Furthermore we present a 
Prolog-style system, RichProlog, which implements LR, and provide an 
extended example of RichProlog’s execution. This example shows that it 
is possible to solve genuine problems in polynomial time and also illus- 
trates RichProlog’s utility, conciseness, and declarative nature. 



1 Introduction 

1.1 Motivation and Background 

Many problems encountered in AI have a significant inductive component, mak- 
ing application of deductive logic difficult and requiring recourse to nonmono- 
tonic reasoning formalisms. Many AI problems require reasoning from only part 
of the total possible data set, when not all the data are currently available. This 
entails the possibility of a mind change when new data appear. It has been argued 
that current nonmonotonic reasoning formalisms do not account satisfactorily 
for common-sense reasoning of the kind needed to solve AI problems, particularly 
as the formalisms do not account for how tentative solutions may subsequently 
be contradicted[19]. The field of inductive inference deals with subsequent con- 
tradiction to a hypothesis in a not explicitly logical framework [9], which has 



B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 149-164, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




150 



Patrick Caldon and Eric Martin 



recently been extended to a logical formalism [15, 16]. This paper outlines the 
theoretical underpinnings of a logic programming system, Limiting Resolution 
(LR) , based on inductive inference, describes LR itself and demonstrates a prac- 
tical implementation. 

We remind the reader of the fundamental concepts of inductive inference. A 
system is said to identify or learn some set of countable nonempty sets £ in the 
limit [6, 9] if, given an unknown member L of £ with an infinite enumeration e of 
the members of L (possibly with repetitions), it produces a correct description 
of L in response to all but a finite number of initial segments of e. Visualize 
this scenario as playing a game where you must guess a set that we have in 
mind. Suppose we decide to play with £ as the set of sets consisting of all the 
numbers greater than some natural number (an example member of £ would 
be {4,5,6,...}). Having first chosen (for example) L = {2,3,4,...}, we then 
proceed through the game in rounds of guesses, as follows. Suppose we tell you 4 
is in the set; you might guess {4, 5, 6, . . . } as the set. If we tell you 7 is in the set, 
then presumably you would decide to stick with your guess. If however we tell 
you next that 2 is in the set, you might want to change your mind and revise your 
guess to {2, 3,4,...}. As from this point on we will never mention the number 
1, you’ll always be happy with this guess of {2, 3,4, ... } however since you’ve 
only ever seen a finite initial sequence, it will appear to you that you may have 
to revise your guess at any time, and for this set L there’s no finite collection of 
data which allows you to be sure that your guess is correct, and so you will not 
have a property analogous to compactness in classical logic. It is clear, however 
that the strategy of “guess the set whose least element is the lowest number thus 
far presented” will in the limit produce the correct hypothesis, that is after at 
most a finite number of mind changes on your part. 

Parametric logic is a novel logic which encompasses notions from classical 
logic and inductive inference [15, 16]. It has a generalized notion of logical con- 
sequence intended to capture more comprehensively the logical nature of many 
problems currently encountered in AI. Parametric logic has a number of key 
properties which distinguish it from classical logic: it is related to the field of 
inductive inference, since some proofs can be considered as identifying in the 
limit from a set of data; it has an explicit idea of intended model; furthermore 
under some constraints exploited in LR it is sound and complete in the limit, 
meaning some finite number of incorrect answers may first be produced, but af- 
ter these incorrect answers (which will require finite computation) only correct 
answers will be produced with no subsequent mind change. Many nonmonotonic 
reasoning formalisms have been proposed; however Parametric logic in general 
foregoes compactness, in contrast to those frameworks examined in [10], which 
all satisfy the compactness property. Several nonmonotonic logics including Re- 
iter’s default logic and Moore’s auto-epistemic logic have close correspondence 
with negation-as-failure semantics for logic programming. Therefore in a logic 
programming context it is more appropriate to examine the relationships with 
nonmonotonic reasoning by analysing negation-as-failure semantics, discussed 
below. 
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LR. is a generalization of resolution based on Parametric logic, with a well 
defined program and query syntax over which it is sound and complete. The 
queries in LR are particular examples of £ 2 sentences, that is sentences of the 
form 3x\fycp(x, y), where tp(x,y) is quantifier free 1 . A LR program is more gen- 
eral than a definite logic program and consists of background knowledge together 
with a potentially infinite stream of data that represent evidence or results of ex- 
periments. At any time, the system has access to the background knowledge plus 
some finite initial segment of the stream of data (the data observed or the results 
of experiments obtained so far). The completeness result guarantees that when 
sufficient data have been collected, the program will output a correct candidate, 
and the same candidate will be output again and again as more data become 
available. Although convergence to this correct candidate is guaranteed, usually 
it is not possible to know in advance how many data will be needed for conver- 
gence. RiclrProlog 2 is our implementation of this system. A preliminary version 
has been presented in [14] . The current version has a number of differences and 
improvements: there is no longer a confidence threshold for the correctness of 
an answer, there is a theoretically grounded notion of negation built into the 
system, a type system has been added allowing for the enumeration of terms, 
and it has a clearly defined syntax which is directly related to the completeness 
of LR. 

Many AI problems are optimization problems, and so have a natural expres- 
sion as a £ 2 query; he. discovering the existence of some x which for all y, x is 
preferable to y. A more general example is: does there exists a rule x such that 
for all possible data y, x predicts y if and only if y can and will be observed. 
Matroids, a class of problems solved by greedy algorithms have a convenient 
expression as a LR query and program, and a natural expression of the mini- 
mum spanning tree problem is shown later in this paper. In particular, solutions 
to matroid problems of polynomial time complexity can be implemented with 
polynomial cost, and the example provided demonstrates this in a natural style. 

1.2 Comparison with Existing Formalisms 

Inductive Logic Programming (ILP)[17] systems have been proposed and con- 
structed to learn theories inductively from collections of data; however these 
systems do not attempt to construct a proof in the conventional sense, as the 
computation is some kind of theory construction procedure, e.g. inverse entail- 
ment, and not some kind of Tarskian logical inference which we employ. A sim- 
ilar consideration applies to abductive logic programming. Also, our approach 
is quite distinct from that of answer-set programming, where the objective is to 
discover some stable model under a 3- valued semantics, rather than find a com- 
puted answer substitution in a 2- valued semantics. Furthermore, unlike belief 
revision systems, we know that any inconsistency that arises from the program, 
the available data and the tentative solution t of the query, has to be resolved 
by dismissing t , as opposed to allowing revision anywhere in the program. 

1 We use x to denote a tuple of variables or terms of appropriate length. 

2 See: http://www.cse.unsw.edu.au/~patc/richprolog 
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Several resolution-style systems have been proposed which incorporate nega- 
tion, including SLDNF[3], SLS[20], SLG[4], and constructive negation. The se- 
mantics for these systems are more complex than that for SLD-resolution, requir- 
ing in different cases 3- valued interpretations, restrictions to stratified programs, 
stable model semantics and other more complex constructions; Apt and Bol[2] 
provide a good summary. The proof technique of LR has some similarities to 
SLDNF, but the semantics are quite different. In particular, SLDNF queries are 
existentially quantified, whereas LR has both existential and universal quan- 
tification, LR permits the selection of non-ground literals, and its semantics 
are defined with respect to the program itself rather than its completion. SLS- 
resolution has a negation as (not necessarily) finite failure rule but is not an 
effective proof procedure as it does not identify a condition analogous to conver- 
gence in the limit. SLG solves the problems in SLS via a tabling mechanism but 
is characterized in terms of a 3-valued stable model semantics, and so is quite 
distinct from our work. SLDNFE[21] also has some similarities in proof theory 
in delaying non- ground literals prior to selection, but again with respect to a 3- 
valued semantics. Further differences between our formalism and others include: 
the weakening of the compactness property, use of Herbrand interpretations for 
intended interpretations as opposed to 3- valued interpretations {e.g., stable and 
well-founded interpretations) and inclusion of a form of true negation. Universal 
quantification has been examined in a logic programming context. Voronkov pro- 
poses bounding universal quantifiers [22] with finite sets. Unlike this and similar 
systems, we use unbounded quantification in queries. 



2 Parametric Logic 

The semantics of LR programs is based on Parametric logic. An overview is 
provided here, but see [16] for a more complete exposition. Parametric logic has 
a set of parameters, namely, a countable possible vocabulary V, a class of possible 
worlds W (a class of V-structures), a possible language £ (a set of V-sentences, 
e.g., the set of first-order V-sentences), a set CD of possible data and a set A of 
possible assumptions with CD, A C £ and CD (~l A = 0. We set 

CP= (V,W,£,K,A) 

and call CP a logical paradigm. LR is based on settings of the parameters such 
that V contains at least one constant, W is a nonempty set of Herbrand interpre- 
tations, and CD is a set of ground literals (i.e. atomic or negation of atomic sen- 
tences). The choice of CD depends on the application, and corresponds to the sim- 
ple observations one can make. Members of A on the other hand are the formulas 
which can be used to express some background knowledge. Suppose for instance 
that we have a data collector for tide heights; then assuming a reasonable choice 
of V, CD could be defined as formulas of the form tide_lreight(fzrae, location ) > low 
or ->(tide_height(tzme, location ) > high ) (where time, location, low and high are 
V-terms, whereas A could contain some facts and rules about arithmetic). 
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The set of possible data true in a possible world 971 is called the 27 -diagram 
of 9 71, and is denoted by Diag^fDT); hence Diag£)(971) = {y>€27 : 97t|=</?}.A 
useful starting point for logical investigation is a generalization of the consistent 
theories of classical logic; it is a possible knowledge base , defined as a set of the 
form Diagj)(9Jt) U A where 911 is a possible world and A is a set of possible 
assumptions (formulas in A) true in 971. We denote by 23 the set of possible 
knowledge bases. Hence 23 is a derived parameter of the logical paradigm CP and 

23 = {Diag-p (971) U A : 971 € W, AC A, 971 |= A} . 

Intuitively, all possible data true in 971 will eventually be observed, or measured, 
and some extra background knowledge about 971 can be added to any ‘natu- 
ral’ theory that (partially) describes 971. Therefore, the intended models of a 
possible knowledge base (member of 23) T are the possible worlds 971 such that 
Diagj, (97 1) = Tfl 27 and 971 (= T D A. Both conditions imply that 971 is a model 
of T, but a closed world assumption is applied to the possible data: a possible 
datum (member of 27) that does not belong to T has to be false in every intended 
model of T, whereas a possible assumption (member of A) that does not belong 
to T can be either true or false (unless it is implied by the other members of 
T ) in T’s intended models. This notion of intended model is at the root of the 
generalized notion of logical consequence of Parametric logic, defined next: 

Definition 1 Let T £ 23 and p € L be given. We say that ip is a logical conse- 
quence of T in CP, and we write T l=w ft J us ^ case f or a M £ "W, if9Jl\=T 
and Diag CD (971) = T D 27 then 971 |= p. 

Intuitively, the class of possible interpretations is restricted and a closed world 
assumption is applied to possible data but not to assumptions which has the ef- 
fect of selecting a class of intended models. When CP is the paradigm of classical 
first-order logic (i.e., W = class of all V-structures, C is the set of first-order 
V-sentences, 27 = 0, and A = £), |=^, is nothing but (=, but for other choices 
of values of the parameters - in particular, the values selected for LR- logical 
consequence in CP is stronger than classical logical consequence; it is not compact, 
and accounts for various kinds of inference. We define three kinds of inference: 
deductive, inductive, and limiting inference. Deductive inferences in CP are char- 
acterized by the compactness property: they are conclusive inferences on the 
basis of a finite subset of the underlying knowledge base. 

Definition 2 Let T £ 23 and p £ L be such that T |=y) tp. We say that p is a 
deductive consequence of T in CP iff there exists a finite subset D of T such that 
for all T' £ 23, if D C T' then T' |=^, p. 

When CP is the paradigm of first-order logic, ‘T \= p' and l p is a deductive 
consequence of T in CP’ are equivalent notions. More generally, for any logical 
paradigm CP, if T \= p then p is a deductive consequence of T in CP, but the 
converse is not always true; this is an instance of the universal query problem [11]. 
Inductive inferences in CP are characterized by a property of weak compactness: 
they are not conclusive inferences, but they can be conclusively refuted. 
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Definition 3 Let T £ 23 and p £ L be such that T p. We say that p is an 
inductive consequence of T in CP iff there exists a finite D C T such that for all 
T' £ 33, if D C T' and T' p then -up is a deductive consequence of T' in CP. 

Note that in the definition above, the discovery of -up from T' is a refutation of 
ip from T' (from some finite subset D’ of T' which might be different from D) . 

Definition 4 Let T £ 33 and p £ £ be such that T \=^ p. We say that p is 
a limiting consequence of T in CP iff there exists a finite subset D of T and a 
member if of L such that if is an inductive consequence of T in CP and for all 
V e 3, if DC V and T |=w if then T |=w 

This more complex inference corresponds to an inductive inference followed 
by a deductive inference. Intuitively, in Definition 4, p is a conclusive infer- 
ence from a formula if which represents an inductive leap. Links with inductive 
inference can be established: deductive consequences in CP can be discovered 
with no mind change, inductive consequences in CP can be discovered with at 
most one mind change, and limiting consequences in CP can be discovered in the 
limit [16]. Essential to LR is the relationship between limiting consequences in 
CP and identification in the limit; this relationship is expressed in Proposition 7 
below. Further, we can show the existence of a whole hierarchy of inferences with 
less than (3 mind changes where (3 is a non-null ordinal; deductive and inductive 
inferences are the particular case where (3 = 1 and [3 = 2, respectively. Under 
some assumptions, given a sentence p, p is a limiting consequence in CP of any 
T £ 33 where T \=yj p if and only if p is a S 2 sentence of a particular form, 
which yields precisely the syntax of the queries for LR. 

For example, assume that V consists of constant 0, unary function s, and 
binary predicate R. For all n £ N, let n denote the numeral (V-term) that 
represents n. Assume that W is the set of Herbrand models of: l R is a strict 
total ordering.’ Finally, assume that CD is the set of all atomic sentences of the 
form R(jn,n), for m,n £ N. Consider the possible world 911 whose domain is 
N and such that RF 1 is the strict total ordering (1, 3, 5 . . . 0, 2, 4 . . .) (a copy of 
2N + 1 followed by a copy of 2N). The CD-diagram of Tl contains i?(4, 8) and 
R( 5,2), but neither R( 7,3) nor i?(8, 3). Set p\ = 3xR(2,x), P 2 = \/y^R(y, 1), 
and pz = 3 x\/y^R(y,x). Then Diag^OH) is a possible knowledge base and one 
can immediately verify that p\ is a deductive consequence of Diagj) (9H) in CP, p 2 
is an inductive consequence of Diag-p (911) in CP, and pz is a limiting consequence 
of Diag2)(91l) in CP, taking if = P 2 in Definition 4. The solution x = 1 to pz 
viewed as a query can be computed in the limit, for instance by hypothesizing 
that 0 is the least element, before this wrong hypothesis is refuted and it is 
correctly conjectured that 1 is the least element. For this particular 911, only 1 
mind change is needed, but note that up to n mind changes might be needed if 
n is the least element, and if numerals are conjectured in their natural order. 

Most problems allow the extraction of a background knowledge K , with the 
property that W is the set of Herbrand models of K. With the previous example, 
we would define K as a set of sentences expressing that R is a strict total 
ordering. For most problems, K is finite whereas the CD-diagram of a possible 
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world is infinite. LR. has access to K and a finite subset of Diagjj (971) , for some 
member 9 71 of W; the key point is that only a finite (though unknown) subset 
of Diag£>(9}t) is necessary to make a correct inference, with larger subsets of 
Diagj, (911) yielding the same result. 



3 Limiting Resolution 

3.1 Limiting Definite Programs and Queries 

Limiting Resolution, described below, is a previously unpublished resolution- 
style system for perfoming inference according to Parametric Logic semantics. 
Let us define a general rule as a formula that can be represented as ip <— ip, where 
ip is atomic and ip is quantifier free and in negation normal form (i.e., is built 
from the set of literals using disjunction and conjunction), using the standard 
logic programming abbreviations. The basic parts of a formula ip are the literals 
that occur in ip 3 . Given two sets X and Y of literals, we say that X underlies Y 
iff all closed instances of all members of X belong to Y. Intuitively, X underlies 
Y if the set X is syntactically more general than Y. Let a literal ip and a set X 
of literals be given. Let ip denote the atomic formula such that ip = ip or ip = —tip. 
We say that ip is related to X iff there exists an atomic formula £ such that ip 
and £ unify, and £ or ->£ belongs to X . This is a particular case of a notion that 
has been considered in the literature on relevant logics [1]. Intuitively, if ip is not 
related to X then tp and X cannot interact. 

A limiting definite program is to LR what a definite logic program is to 
SLD-resolution, and hence is more general. In the context of Parametric logic, 
limiting definite programs are a particular kind of possible knowledge base, in a 
logical paradigm IP satisfying the following: 

— V contains at least one constant. 

— CD is a set of ground literals; it might consist of nothing but ground atoms 
(for problems where only positive data are available) , or ground literals (for 
problems where both positive and negative data are available). 

— A is a set of general rules such that: 

• for all ip € A, the head of tp is not related to CD; 

• for all <p € A and for all basic parts ip of the body of tp, ip is not related 
to CD and ip is atomic, or ip is related to CD and ip underlies CD. 

— W is the class of Her brand models of A. 

Hence we consider the very particular case of a logical paradigm CP where all 
possible assumptions are true in all possible worlds. This means that the whole 
of A can play the role of background knowledge. Though the set CB of possible 
knowledge bases is equal to {Diag CD (9It) U A : Wt € W, A C A}, we focus on the 
particular knowledge bases T where Tfl A is maximal, hence equal to A\ 

Of course, if a is an atomic formula and all occurrences of a in ip are preceded by 
negation, then a is not a basic part of ip. 
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Definition 5 A limiting definite program (in IP) is a member of CB of the form 
A U Diag^) (Oil) , for some Wl € W. 

For instance, assume that V contains unary predicate symbols pi,p 2 , qi, < 72 , 93 - 
Suppose that CD = {gi(f), qi{t), ~^q 2 {i), -><73 (t) : ground terms t}. Then: 

— pi(X) <— qi(X),p 2 (Z),q 2 (X),->q 2 (Z),->q 3 (Y) can be a member of A 

— Pi(X) <— qi(X),-ip 2 (Z),q 2 (X),-iq 2 (Z),q 3 (Y) cannot be a member of .A (be- 
cause of either ~^p 2 {Z) or q 3 {Y)). 

The heads of the members of A, and the basic parts in the bodies of the members 
of A that are not related to CD, should be thought of as theoretical atoms. The 
other formulas that occur in the bodies of the members of A are formulas such 
that all their closed instances are possible data. Note that singleton clauses can 
be either theoretical atoms, or literals all of whose closed instances are members 
of CD. A simple mechanism to ensure that theoretical atoms and members of the 
set D of possible data cannot interact is to partition the set of predicate symbols 
in V into two classes, evidential and theoretical predicate symbols, and require 
that a member of D (or resp. member of A) be built from an evidential (or resp. 
theoretical) predicate symbol. LR solves limiting definite queries in the context 
of limiting definite programs. Limiting definite queries are more general than 
definite queries since they are S 2 sentences of a particular form. To define them 
we need a preliminary concept. A literal a is A) -simple iff: 

— a is positive and not related to D, or 

— a underlies D. 

Definition 6 A sentence p is a limiting definite query (in CP) iff it is of the 
form 3x(ip(x) A\/y-<x(x,y)) where if and \ are quantifier free and in negation 
normal form, and all basic parts of if and x are D-simple. 

Assume that V and D are defined after Definition 5. Then in Definition 6 , ip 
cannot be ~<pi(Xi) f\q 3 (X\) (because of either -^pi(Xi) or q 3 (Xi)) and % cannot 
be (-V 2 (Y 1 ) V^i (y 2 )) A (g 2 (Y 2 ) V-g 3 (Y 2 )) (because of either -<p 2 (Yi) or -■ qi(Y 2 )). 
For a more complex example, if D is defined as {observed(f) : ground terms t\ 
Q = 3P3L[pattern(P) A list(L)A same Jength(P, L) A matclr(P, L)A 
VW / ->(word(W / ) A observed(PF) A (smaller _length(P, W) V mismatch(P, W)))] 
is a limiting definite query. Solving Q means computing a witness for (P, L) that 
makes the query a logical consequence in CP of the union of A (not shown here) 
with a possibly infinite set of sentences of the form observed(t). 

3.2 Limiting Resolution 

The following result shows that we can discover in the limit whether a limiting 
definite query is a logical consequence in CP of a limiting definite program. 

Proposition 7 Assume that CP satisfies the conditions stated in Section 3.1. 
There exists a computable function f that maps finite subsets of D into {0, 1} 
such that for all limiting definite queries ip and for all 9Jt € W, the following are 
equivalent. 
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— yiUDiag I ,(OT) |=w ip 4 ; 

— for all enumerations (ej)j 6 N of Diagj) (9It) U {$}, /({e 0 , . . . , ef\ \ {jj}) = 1 for 
all but finitely many igN 5 . 

Proposition 7 is a very particular case of a much more general result of 
Parametric logic. LR. is a particular proof procedure that discovers in the limit 
whether a limiting definite query p is a logical consequence in IP of a limiting 
definite program, and that computes in the limit witnesses to the existentially 
quantified variables of p. 

LR proofs are constructed in a similar manner to SLD and negation-as-failure 
formalisms by constructing trees or forests whose nodes are labelled with formu- 
las. We describe the system under the constraint that programs are restricted to 
clauses, and queries are in the form 3x(ip(x) A Vy->x(x, y)) with if and y being 
conjunctions of ©-simple literals. This description can be extended to the pro- 
grams and queries of Definitions 5 and 6 respectively (which includes disjunction) 
by using a mechanism similar to the Lloyd-Topor transformation[13]. There is 
no conceptual difference between the simplified and more general version. 

Take some possible world Wl, and consider A C Diag CD (9Jt). Assume that 
A is some set of clauses over ©-simple literals. Now P = A U A is a fragment 
of the limiting definite program Diag 1) (9Jt) U A. Let Q be a limiting definite 
query restricted to conjunctions. We define a LR triple for (P, Q) as a triple 

3 = (P, I, s), with the following properties: T is a tree, called the deductive tree 
of the LR triple whose nodes are labelled with finite sets containing ©-simple 
literals and at most one negation of a conjunction of ©-simple literals; I is a 
collection of trees called the inductive trees of the LR triple with finite sets of 
©-simple literals labelling the nodes; and s is a partial mapping from leaves in 
T to trees in I. Given a LR triple 3, a LR triple 3" which extends 3 is defined 
inductively as follows. Take a leaf N from T, with label M and select via some 
selection rule a ©-simple literal C in M ; if no ©-simple literal is present then, if 
available, the negation of the conjunction of ©-simple literals will be selected as 
C. We attach a new component to 3 to create 3' = (T\ P, s'), where T' extends 
T, I ’ expands I ( i.e . by extending trees in / or adding new trees to I), and s' 
expands s depending on the following rules: 

— If C is a ©-simple literal in some tree in {T} U I, and there is some rule 
C' <— f\ L € P, with L possibly empty, such that C and C' unify - create 
a new node labelled with ((M\{C}) U L)6 below N for every appropriate 
C’ f\ L G P where 9 is the mgu of C and C' . The mapping s' is updated 
appropriately. 

— If C is a negation of a conjunction of ©-simple literals and s(C) is not defined 

add a new single node tree U to / to create P with the conjuncts in C 
labelling the root of U and expand s to s' by setting s'(C) = U. 

— If neither rule applies - then 3' = 3. 

4 Since all members of A are assumed to be true in all members of W, it is clear that 
A U Diag b (9U) \=w p is equivalent to Diag^iOT) p. 

5 We denote by jj an extra symbol, whose intended meaning is ‘no datum provided,’ 
that is necessary for the particular case where Diag B (OT) = 0. 
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We then define a LR forest for (P, Q) as a sequence of LR triples for (P, Q) 
(Jo, Ji, T 2 , • • • ) where: 

— the deductive tree of Jo is a root labelled with the original query and the 
set of inductive trees of Jo is empty; 

— for all i€N, J; + i is an extension of J,: 

— whenever i G N and a leaf N in one of the inductive trees of J, is not either 
a success or a failure node in the sense of SLD-resolution, there exists some 
j > i such that N is an interior node in all the inductive trees of J j. 

A maximal LR triple 
is defined as a LR triple 
to which no non-identity 
extension applies. A LR 
triple is successful iff ei- 
ther its deductive tree 
contains a node labelled 
with the empty clause or 
the LR triple contains at 
least one inductive tree 
for which no node is la- 
belled with the empty 
clause. A LR forest (Ji)j 6 N 
is successful iff cofinitely many of its LR triples are successful. The system calcu- 
lates computed answer substitutions on successful LR triples identically to SLD- 
resolution on the deductive trees, with the additional condition that negated 
conjunctions which label leaves of the deductive tree are considered success- 
ful with the empty answer substitution. The computed answer substitutions of 
a successful LR forest are defined as the substitutions that are the computed 
answer substitutions of cofinitely many successful LR triples. 

For example, consider the set of formulas A = {num( 0), num(s(X)) *— 
num(X), ItfO, s(X)), lt(s(X), s(Y)) <— lt(X, Y)} where 27 = (obs(s n ( 0)) : n £ 
N}, and A = {ofcs(s n (0)) : n > 1} = Diag CD (OT), and the program P = A U A. 
We ask the query, 3 X(num(X) A VY~<(obs(Y) A lt(Y,X))). Figure 1 shows a 
diagrammatic representation of the computation, which has solutions X = 0 
and X = s(0). This diagram is infinite - for building practical systems we need 
the following key result: 

Proposition 8 Let a limiting definite query Q = 3 x(i/j(x) AVy->x(x, V))> a tuple 
of terms t, and DJI £ W be such that Diag® (971) U A |=^ ip(t) A y) ■ Then 

there exists a successful LR forest (J j)igN for (Diag23(97t) \JA,Q), a tuple of 
terms t' , and a finite subset A of Diagj) (971) with the following properties: 

— t' is a computed answer substitution of (Ti)jgN a t least as general as t. 

— Let A' be such that A C A' C Diag- D (97l). For all i £ N, let J' be the 
maximal LR triple for ( A ' U A,Q) such that J * extends J'. Then (J')j 6 N 
is a successful LR forest for (A' U A , Q) having t' as a computed answer 
substitution. 



{ obs(Y) ,lt(Y, 0)} 

■ [ obs(y) Jt(Y,s(0))} 

■((>))} 



{ num(X) , -.( obs(Y ) A lt(Y, A))} 

-,(obs(y) A J(y,0)) } ~^m(£(A))^obs(y) A lt(Y, s(X)))} 

{^(obs(Y) { n I(^fflH 0 ^(y) A it(Y S 2 W))} 

0),0) } • {obs(Y) ,lt(Y,s(Q))} {^(obs(Y) Alt1xs 2 (0))) } • 

{ obs(Y), li(Y,s 2 ( 0))} 

0) } o) > { ft(«(o)L(o)) > 

{tt(0,i(0)) } 



Fig. 1 . Limiting Resolution Forest 
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This construction and proposition provide a logic programming analogue to 
the formula / discussed in Proposition 7 above. Discovery of t! from some finite 
A C Diag2j(9Jl) corresponds to eventually finding some witness which remains 
a witness for the existentially quantified query variables as A is progressively 
extended to Diag CD (9Jl). Assume that the members of Diag I) (9Jl) are enumerated 
one by one. LR. will then be run on the background knowledge A plus some finite 
initial segment of the enumeration of Diag^, (911) . At the start of our computation 
Diagj)(91l) is unknown, however the computation will necessarily converge in 
the limit to one of the same witnesses that Diag^DJt) would have produced. 
Consequently it is not necessary to traverse the entire (generally infinite) forest 
for (Diag-t) (911) U A,Q), but only some finite portion of the forest. Usually, it 
is not possible to know in advance how many data will be necessary for LR to 
converge to the right answer. 

To consider the relation with conventional logic programming semantics, con- 
sider the case where A is a set of definite clauses, all members of T> are atomic, 
and T is a limiting definite program in CP. For all definite (as opposed to limiting 
definite) queries ip, T \= ip iff T \=yj ip. But when ip is a limiting definite query, 
T |= ip is generally not equivalent to T \=yj ip. LR is a system that targets 
formulas higher than S\ in the arithmetical hierarchy, but w.r.t. to the gener- 
alized notion of logical consequence \=^ rather than w.r.t. the classical notion 
of logical consequence |=. This is in contrast with extensions of Prolog proposed 
by [12], 

4 Implementation 

4.1 Implementation Details 

RichProlog is our implementation of LR. Its execution proceeds by performing a 
depth- first search on the LR forest, attempting first to generate some tentative 
witness or directly find a contradiction in the deductive tree of the forest, and 
subsequently attempting to construct some refutation of the tentative witness 
in an inductive tree of the forest. Failed tentative witnesses cause backtracking 
to other tentative witnesses. In the course of the execution there is an oppor- 
tunity for users to enter new members of D, at every stage when a computed 
answer substitution is proffered to the user. RichProlog converges in the limit to 
a correct witness (actually a most general unifier) for the existentially quantified 
variables in the query should one exist, and otherwise reports a failure. RichPro- 
log programs and queries are practical extensions of limiting definite programs 
and queries, in a similar manner to Prolog programs and queries being practi- 
cal extensions of SLD-resolution (with the exception of negation-as-failure, since 
negation is already part of LR). The system is supported by the completeness 
result of LR, but note that it uses an incomplete left-to-right depth-first-search 
selection rule, and only optionally uses an occurs check on account of the worst- 
case exponential cost. 

In the present implementation of RichProlog, the set D is defined implicitly. 
Evidential predicate symbols are those preceded with the quote symbol ‘ , and 
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D will be derived from these predicates. The kind of (true) negation introduced 
in RichProlog programs is indicated by preceding the atom to negate with rnot. 
Note that D might not contain every ground atom built from an evidential pred- 
icate symbol. The list of evidential predicate symbols can be determined auto- 
matically by examining the program. Formulas built from an evidential predicate 
symbol can use any terms as arguments, but a type system described below may 
present additional restrictions on CD. Unlike Prolog, any solution presented by 
RichProlog is tentative and can possibly be contradicted as a result of later data 
being discovered. 

Using a type system we can increase the scope of RichProlog queries. We 
permit queries of the form 3(x £ T)Wyip(x,y), where T defines some type, which 
can be rewritten as 3 x(x G T A Vyip(x,y)). This effectively constructs a generate- 
and-test strategy for the existentially quantified variables, and is convenient in 
practice. Many type systems have been proposed for logic programming lan- 
guages. Among those, many only type variables, as we do, using the justification 
in [18], and here the syntax and unification rules we use are derived from this. 
The only notable departure is the addition of a type @var, to represent a variable 
in the enumeration of types. In general the variable type is required for complete- 
ness (since any constant in V can be replaced by a variable in an enumeration 
of all terms over V), but it can be convenient to exclude this on occasion. Given 
that the type system exists primarily to define V we have used the most limited 
type system available. We anticipate extending this to a more substantial type 
system with more static checking and stronger capabilities for type inference. 
The type system is based on a very simple constraint system, allowing variables 
to be unified only with terms having the same type as variable. The current 
implementation is made simple by use of attributed variables[7]. 

4.2 Example 

One might posit that the implementation of any complex algorithm will neces- 
sarily require exponential time on account of the implicit RichProlog “generate- 
and-test” strategy. We present a program for finding minimum spanning trees 
(MSTs) in RichProlog based on Kruskal’s algorithm[5] which executes in poly- 
nomial time on finite graphs. The MST problem was chosen for being one of the 
more complex problems possible to present thoroughly in a short paper which 
demonstrates RichProlog features. This problem when restricted to finite graphs 
has a well known 0(E log E) upper bound on complexity, E being the number of 
edges. If Diag n(5D?) is finite, this problem corresponds to the conventional MST 
problem, however this implementation will also work for infinite cases. This MST 
problem asks “does there exist some finite spanning tree S' of a (possibly infinite) 
graph G, such that for all spanning trees S', l(S ) < l(S') where l is the sum 
of the weights of edges in the graph” . Solving this query directly appears to be 
intractable however. Kruskal’s algorithm provides a tractable method, and can 
be quickly summarized: sort the edges into ascending order of weights, and set 
the partial MST S := 0. Then for each edge e in the list, update S := S U {e} 
iff S U {e} is acyclic. 
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exists - G : : : exists - T : : : 
ordered_graph(G) , subgraph(G,T) , 
forall - E : : : 
rnot (edge_of (E,G) , 

( edge_of_subgraph(E,T) , 

not_kruskal_invar iant (E , T) 

; not_edge_of_subgraph(E,T) , 
kruskal_invar iant (E , T) 

)). 



'/, E - edge of graph; G - graph 
•/. T - candidate MST 
'/, TP - transformed MST 
kruskal_invariant(E,T) 

transform(E,T,TP) , 
acyclic (TP) . 

not_kruskal_invariant(E,T) 
transform(E,T,TP) , 
cyclic(TP) . 

‘edge(nl-n2-3, 1) . ‘edge(nl-n3-l ,2) . 
‘edge(nl-n4-2,3) . ‘edge(n2-n3-7,4) . 



Fig. 2. Query and Program Fragment for Kruskal’s Algorithm 



It is easy to construct a RichProlog query corresponding to this algorithm: 
does there exist some subgraph S of G such that for all edges e £ G, e 
belongs to S iff (e, S) satisfies the invariant of Kruskal’s algorithm, that is, 
T(e,S) = {e 1 £ S : l(e') < 1(e)} U {e} is acyclic. The translation from this 
informal description to a RichProlog query is straightforward (an excerpt is 
shown in Figure 2), using predicates to represent e € S, e ^ S, U S is cyclic” 
and “S is acyclic”. Each subgraph of G is represented by a list where, for ex- 
ample [in, out, in] would represent that the subgraph contains the first and 
third edges of G but that the second edge is absent. This allows for a sim- 
ple implementation of a transform/3 predicate in negation-free Prolog, which 
given some subgraph S C G and edge e € G produces the set T(e, S) described 
above. The subgraph/2 predicate produces an enumeration of all subgraphs of 
a graph. The edge_of /2 predicate is similar to a pure Prolog member /2 predi- 
cate, and edge_of _subgraph/2 and not_edge_of _subgraph/2 are simple tests 
to determine whether an edge is present in a subgraph or not. The predicates 
cyclic/1 and acyclic/1 are conventional depth first search checks for whether 
some subgraph is cyclic or not. It is easy to verify that all these predicates can 
be expressed as definite clauses enriched with a \= test. The data are presented 
in the 1 edge/2 predicate, which is a two place predicate to allow the easy im- 
plementation of ordered_graph/l, which has the task of assembling an ordered 
graph from the edges. 

Some key areas should be noted where the implementation is impure - these 
occur in the body of the program which is elided for brevity. The program 
conducts several does-not-unify (i.e. \=) tests. Since the vocabulary is known 
and finite, and the intended interpretations are Herbrand, this can be replaced 
with an explicit does-not-unify predicate for this vocabulary. As these are used to 
check either the non-unification of two ground variables, or the non-unification of 
a variable and [] , asymptotically there will be no cost difference between a logical 
implementation and \=, given a good predicate indexing system in the underlying 
Prolog. The sort/2 predicate is used, which is avoidable by choosing an order 
on atoms, and Prolog-style arithmetic is also employed, which could be replaced 
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by Peano arithmetic or a constraint system. To simplify the implementation 
and subsequent argument we require that all edges be of different length; this 
is an inessential restriction. We suggest that all of these impure aspects of the 
implementation are straightforwardly overcome. 

4.3 Cost 

While it would appear that this implementation has exponential running time as 
presented, due to the exhaustive enumeration of subgraphs in the subgraph/2 
predicate, a tab ling-based technique can dramatically reduce the search space. 
If any inductive tree succeeds (i.e. its calling query in the deductive phase fails), 
then any query subsumed by the label of the root of this inductive tree will 
also cause a failure. If the implementation ensures that the traversal of the 
deductive tree finds more general nodes before less general nodes, and tables or 
memoizes these more general nodes upon failure, then the traversal of the less 
general nodes in the tree can be avoided. There is a cost: at each inference step 
involving variables which appear in the nodes mapped to inductive trees, we 
need to test if the variables are subsumed by the table. This technique resembles 
the tabling technique of SLG resolution and other similar systems [4]. RichProlog 
tables n-tuples of terms (with n the number of existentially quantified variables) 
rather than subgoals, and makes a tabling check at each unification operation 
which concerns these variables rather than at each subgoal. This is of course 
more expensive, but seems to work well in practice. 

We give a sketch of a proof that this implementation has polynomial time 
cost, even allowing for the additional tabling overheads. This sketch covers 
the key points of a formal proof, omitting the precise details of the under- 
lying machine formalism. Consider first the cost of the negated part of the 
query inside the rnot statement. The selection of an edge from a list of edges 
of size E and the check for absence of an edge from such a list in edge_of, 
edge_of _subgraph and not_edge_of _subgraph is in O(E), and the depth first 
searches in cyclic/acyclic can traverse each edge at most once, so these will 
again be in 0(E). The transform/3 predicate can also be performed by con- 
ducting a linear scan of a subgraph and so has a bound in 0(E). Thus this 
part of the algorithm has a worst case bound in 7 • 0(E ) = 0(E). This is the 
time to consider a single edge being present or absent given a particular sub- 
graph. The subgraph/2 predicate produces subgraphs in the following order: 
[ . . . , in I _] , [ . . . , out I _] , in a general to specific ordering. Call the set of all 
list terms with i ground terms followed by a variable list element or the empty 
list Tj, so for example [in, out I _] is in Ti- Assume that all lists which are not 
initial segments of the list representation of some MST which are shorter than 
i already have a tabled term which will cause them to fail. Assume that the 
start of the list L just constructed by subgraph/2 is a subgraph of the MST. 
Exactly one of L + [in] and L + [out] will be a subpart of some MST given 
all edges have distinct lengths; Kruskal’s algorithm guarantees that the correct 
list is chosen. The other list will be tabled, and so all terms of length Tj+i which 
are not sublists of the MST will have some element in the table which will cause 
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unification against them to fail. Furthermore, the empty list, the only member 
of To will be a subgraph of every MST. Thus the program supplemented with 
tabling will consider at most 2 terms in T) for each i. By performing this op- 
eration repeatedly, the program will determine the presence or absence of each 
edge in turn, which will require in the worst case 2 • 0(E ) = O(E) attempts at 
the negated part of the query. For each Tj, of which there are E, there will be 
a check of the negated part of the query which requires O(E) unifications. This 
entails a 0(E 2 ) upper bound on the number of unifications. 

Tabling modifies the unification operation, making it more expensive and no 
longer an atomic operation for a cost analysis. In the worst case, the number of 
tabled elements is E (as each negated part of the query could contain a failure 
case), and the elements of the table have maximum size in O(E) making the cost 
of a subsumption check O(E) on account of the linear bound on a unification 
check without an occurs check, and so the worst case cost of a unification with 
tabling check is 0(E 2 ). The number of unifications involving a tabling check will 
be less than or equal to the number of unifications required in the execution of 
the entire program. In the normal execution of the algorithm, it is possible for a 
unification to occur at each point in the execution, which gives a simple upper 
bound on cost of 0(E 4 ), inside polynomial time. 



5 Conclusion 

Unlike most formalisms, Parametric logic accounts for logical consequence re- 
lationships that are not compact. Some of the developments from this are the 
construction of a resolution framework as well as a Prolog-like system which can 
both account for tentative solutions and the requirement for subsequent mind 
changes. LR. is designed to tackle computation in the limit, and guarantees that 
after some finite computation a correct answer is constructed should one exist, 
however the system cannot necessarily determine when this point is reached. LR 
is sound and complete with respect to a natural 2- valued semantics, and incorpo- 
rates a notion of true negation. It is possible to build a system which implements 
real algorithms in polynomial time, which goes some way to demonstrating the 
practicality of the system. RichProlog is upwards compatible with existing Pro- 
log systems, and so we can expect that existing Prolog compiler technology will 
be usable in a RichProlog setting. 

This work has several possible extensions. The relationships between existing 
nonmonotonic reasoning formalisms need to be more precisely characterized; par- 
ticularly the precise relationship between existing negation-as-failure formalisms 
needs to be described beyond noting that LR is distinct from these formalisms. 
Also it may be fruitful to describe these formalisms in terms of mind change 
bounds. LR programs are currently restricted to positive theoretical literals, 
and finding an appropriate formalism to deal with these would be profitable. We 
have shown that RichProlog is useful in an example, and our aim is to apply it 
to more complex problems relevant to AI. 
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Abstract. The task of inverting logical entailment is of central impor- 
tance to the disciplines of Abductive and Inductive Logic Programming 
(ALP & ILP). Bottom Generalisation (BG) is a widely applied approach 
for Inverse Entailment (IE), but is limited to deriving single clauses from 
a hypothesis space restricted by Plotkin’s notion of C-derivation. More- 
over, known practical applications of BG are confined to Horn clause 
logic. Recently, a hybrid ALP-ILP proof procedure, called HAIL, was 
shown to generalise existing BG techniques by deriving multiple clauses 
in response to a single example, and constructing hypotheses outside the 
semantics of BG. The HAIL proof procedure is based on a new semantics, 
called Kernel Set Subsumption (KSS), which was shown to be a sound 
generalisation of BG. But so far KSS is defined only for Horn clauses. 
This paper extends the semantics of KSS from Horn clause logic to gen- 
eral clausal logic, where it is shown to remain a sound extension of BG. A 
generalisation of the C-derivation, called a K*-derivation, is introduced 
and shown to provide a sound and complete characterisation of KSS. 
Finally, the K*-derivation is used to provide a systematic comparison of 
existing proof procedures based on IE. 



1 Introduction 

Abduction and induction are of great interest to those areas of Artificial In- 
telligence (AI) concerned with the tasks of explanation and generalisation, and 
efforts to analyse and mechanise these forms of reasoning are gaining in impor- 
tance. In particular, the disciplines of Abductive Logic Programming (ALP) [4] 
and Inductive Logic Programming (ILP) [8] have developed semantics and proof 
procedures of theoretical and practical value. Fundamentally, both ALP and ILP 
are concerned with the task, called Inverse Entailment (IE), of constructing a 
hypothesis that logically entails a given example relative to a given background 
theory. In practice, the main difference between ALP and ILP is that whereas 
abductive hypotheses are normally restricted to sets of ground atoms, inductive 
hypotheses can be general clausal theories. 

To date, the inference method of Bottom Generalisation (BG) [6, 15] is one of 
the most general approaches for IE to have resulted in the development of high- 
performance tools of wide practical application. Central to this success has been 
the use of Muggleton’s notion of Bottom Set (BS) [6] to bound a search space 
that would otherwise be intractable. However, methods based directly on BG 
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are subject to several key limitations. By definition they can only hypothesise a 
single clause in response to a given example, and Yamamoto [14] has shown they 
are further limited to deriving a class of hypotheses characterised by Plotkin’s 
notion of C-derivation [10]. In practice, known proof procedures for BG are 
limited to Horn clause logic; as evidenced, for example, by the state-of-the-art 
ILP system Progol [6[. 

Recently, Ray et al. [12] proposed a hybrid ALP-ILP proof procedure, called 
HAIL , that extends the Progol approach by hypothesising multiple clauses in 
response to a single example, and by constructing hypotheses outside the se- 
mantics of BG. Also in [12], a semantics for HAIL called Kernel Set Subsump- 
tion (KSS) was presented and shown to subsume that of BG. So far, this new 
semantics is defined only for Horn clause logic and, as yet, no corresponding 
characterisation of KSS has been found to generalise the relationship between 
BG and C-derivations. It was conjectured in [12], however, that a natural exten- 
sion of the C-derivation, called a K-derivation , could be used to obtain such a 
characterisation of KSS. 

In this paper, the semantics of KSS is extended from Horn clauses to gen- 
eral clauses, where it is shown to remain a sound generalisation of BG. A new 
derivation is defined, called a K*- derivation, that both refines the K-derivation 
and generalises the C-derivation. The K*-derivation is shown to give a sound 
and complete characterisation of KSS, thereby resolving the conjecture above. 
The paper is structured as follows. Section 2 reviews the relevant background 
material. Section 3 lifts the semantics of KSS to general clausal logic. Section 4 
introduces the K*-derivation and shows how it characterises the generalised KSS. 
Section 5 uses the K*-derivation as a means of comparing related approaches. 
The paper concludes with a summary and directions for future work. 



2 Background 

This section reviews the necessary background material. After a summary of 
notation and terminology, the notions of ALP and ILP are briefly described in 
order to motivate the underlying task of IE. Relevant definitions and results are 
recalled concerning the semantics of BG and KSS. 

Notation and Terminology. A literal L is an atom A or the (classical) nega- 
tion of an atom -<A. A clause C is a set of literals {Li, ..., L n } that for convenience 
will be represented as a disjunction L\ V ... V L n . Any atom that appears negated 
in C is called a negative or body atom, and any atom that appears unnegated in 
C is called a positive or head atom. A Horn clause is a clause with at most one 
head atom. The empty clause is denoted □. An expression is a term, a literal, or a 
clause. A theory T is a set of clauses {Ci, ..., C m } that for convenience will be rep- 
resented as a conjunction {C\ A ... A C m }. This paper assumes a given first-order 
language £ that includes Skolem constants. An expression or theory is said to be 
Skolem-free whenever it contains no Skolem constant. The symbols T and _L will 
denote the logical constants for truth and falsity. The symbol |= will denote clas- 
sical first-order logical entailment. The equivalence X A Y |= Z iff X |= -Y V Z 
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will be called the Entailment Theorem. Whenever a clause is used in a logi- 
cal formula, it is read as the universal closure of the disjunction of its literals. 
Whenever a theory is used in a logical formula, it is read as the conjunction of its 
clauses. In general, the symbols L , M will denote literals; A, p will denote ground 
literals; P,N will denote atoms; cr, S will denote ground atoms; S,T will denote 
theories; and C,D,E will denote clauses. Symbols B , H will denote Skolem-free 
theories representing background knowledge and hypotheses, respectively. Sym- 
bols e, h will denote Skolem-free clauses representing examples and hypotheses, 
respectively. A substitution a is called a Skolemising substitution for a clause C 
whenever a binds each variable in C to a fresh Skolem constant. A clause C is 
called a factor of a clause D whenever C = D<p and <f> is a most general unifier 
(mgu) of one or more literals in D. A clause C is said to 9 -subsume a clause D, 
written C >p= D, whenever CO C D for some substitution 9. A theory S is said to 
9-subsume a theory T, written S □ T, whenever each clause in T is 0-subsumed 
by at least one clause in S. If L is a literal, then the complement of L , written L, 
denotes the literal obtained by negating L if it is positive, and unnegating L if 
it is negative. If C = L\ V ... V L n is a clause and a is a Skolemising substitution 
for C, then the complement of C (using a), written C, is defined as the theory 
C = {L\(j A ... A L n a}. The standard definition of resolvent is assumed, as de- 
fined for example in [2]. A resolution derivation of clause C from theory T is a 
finite non-empty sequence of clauses IZ = ( R \ , . . . , R n =C) such that each clause 
Ri £ (i?i, . . . , R n ) is either a fresh variant of some clause D £ T, or a resolvent 
of two preceding clauses P,Q £ (i?i, . . . , Ri-\). In the first case, Ri is called 
an input clause , and D is called the generator of Ri. In the second case, Ri is 
called a resolvent , and P and Q are called the parents of Ri. A tree derivation 
of C from T is a resolution derivation of C from T in which each clause except 
the last is the parent of exactly one child. A derivation of □ from T will also be 
called a refutation from T. The composition of two tree derivations 7Z\ and 72-2, 
written IZi + 72-2 , is the tree derivation obtained by concatenating the sequence 
IZ 2 on to the sequence IZ\, taking care to rename any variables that may clash. 
The Subsumption Theorem states if a theory T logically entails a clause C then 
either C is a tautology or else there exists a tree derivation from T of a clause 
D that 0-subsumes C, as shown for example in [9]. 



Abductive and Inductive Logic Programming (ALP & ILP) [4, 8] for- 
malise in a logic programming context the notions of explanation and general- 
isation. With respect to a given theory, ALP constructs explanations for given 
observations, while ILP computes generalisations of given examples. Many ALP 
and ILP techniques are incremental in that they focus on one observation or 
example at a time and try to construct a partial hypothesis, if, that entails this 
one example, e, relative to the background theory, B. This fundamental problem, 
which is known as the task of Inverse Entailment (IE), is formally defined as 
follows. Given a theory B and a clause e, find a theory H such that B U H |= e. 
For reasons of efficiency, some form of search bias is normally imposed on the 
process used to find H, and one such method is discussed next. 
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Bottom Generalisation (BG) [6, 15] is an important approach for IE that 
is based on the construction and generalisation of a particular clause called a 
Bottom Set. Formally, as shown in Definition 1 below, the Bottom Set of B and 
e, denoted Bot{B,e), contains the set of ground literals / jl whose complements 
are entailed by B and the complement of e. As shown in Definition 2, the hy- 
potheses derivable by BG are those clauses h that 0-subsume Bot(B,e). It is 
worth emphasising that B, h and e are all assumed to be Skolem-free. 

Definition 1 (Bottom Set). Let B be a theory , let e be a clause, let a be 
a Skolemising substitution for e, and let e be the complement of e using <r. 
Then the Bottom Set of B and e (using a), denoted Bot(B,e), is the clause 
Bot(B,e ) ={/i | B U e \= p} where the p are ground literals. 

Definition 2 (BG). Let B be a theory, and let e and h be clauses. Then h is 
said to be derivable by BG from B and e iff h)p Bot(B , e). 

The key point is that instead of exploring the entire IE hypothesis space, 
which is intractable, BG only considers a sub-space that is both smaller and bet- 
ter structured than the original. Formally, this sub-space is the 0-subsumption 
lattice bounded by the Bottom Set and the empty set. But, as described be- 
low, the advantage of a more tractable search space comes at the price of in- 
completeness. This incompleteness can be characterised by Plotkin’s notion of 
C-derivation [10], which is formalised in Definition 3. 

Definition 3 (C-derivation). Let T be a theory , and C and D be clauses. 
Then a C-derivation of D from T with respect to C is a tree derivation of D 
from TU{C} such that C is the generator of at most one input clause. A clause 
D is said to be C-derivable from T with respect to C , denoted {T, C ) b c D, iff 
there exists a C-derivation of D from T with respect to C . 

Informally, a C-derivation is a tree derivation in which some given clause 
C may be used at most once. The important result, as shown in [15], is that 
a hypothesis h is derivable by BG from B and e if and only if there is a C- 
refutation from BUe with respect to h. Therefore C-derivations characterise the 
restrictions on the hypotheses derivable by BG. In order to (partially) overcome 
these restrictions, the semantics of KSS was introduced, as described next. 

Kernel Set Subsumption (KSS) [12] can be seen as extending BG to derive 
multiple clause hypotheses drawn from a larger hypothesis space. Like BG, KSS 
considers only a bounded lattice based sub-space of the full IE hypothesis space. 
But, whereas BG uses a single clause Bottom Set to bound its search space, KSS 
uses instead a set of clauses called a Kernel Set. The relevant notions are now 
recalled for the Horn clause case in Definitions 4, 5 and 6 below. 

As shown in Definition 4, before a Kernel Set is formed, the inputs B and e are 
first normalised by Skolemising e and transferring the body atoms as facts to B. 
Formally, the normalised example e is the clause containing the Skolemised head 
atom of e, while the normalised background knowledge B is the original theory 
B augmented with the Skolemised body atoms of e. In all of these definitions 
negative clauses are formally treated as if they had the head atom ‘_L’. 
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Definition 4 (Horn Normalisation). Let B be a Horn theory, let e= PV 
- 1 N 1 V ... V ~'N m be a Horn clause, and let cr be a Skolemising substitution for 
e. Then the normalisation of B and e (using a), consists of the theory B = 
B U {Nicr A ... A N m a}, and the clause e = Per. 

Definition 5 (Horn Kernel Set). Let B and e be the result of normalising 
a Horn theory B and a Horn clause e, and let JC = {fci A ... A k n } be a set of 
ground Horn clauses hi = otiV 6} V ... V . Then JC is called a Kernel Set of 
B and e iff BU{ai A. . .A a n } |= e, and B \= Sj for all 1 < i < n and 1 < J < m*. 

Definition 6 (Horn KSS). Let B and H be Horn theories, and let e be a 
Horn clause. Then H is said to be derivable by KSS from B and e iff H □ /C 
for some Kernel Set K. of B and e. 

As shown in Definition 5 above, a Horn Kernel Set of a Horn theory B 
and Horn clause e is a Horn theory K, whose head atoms aq, . . . , a n collectively 
entail the normalised example e with respect to B , and whose body atoms Sj 
are individually entailed by the normalised background B. Here, n > 1 denotes 
the (non-zero) number of clauses in /C, and m* > 0 denotes the (possibly-zero) 
number of body atoms in the i th clause hi. As shown in Definition 6, a theory 
H is derivable by KSS whenever it 0-subsumes a Kernel Set 1C of B and e. 

So far KSS is defined only for the Horn clause subset of clausal logic. In 
this context it has been shown in [12] that KSS is sound with respect to IE 
and complete with respect to BG. However, as yet, no exact characterisation 
of the class of hypotheses derivable by KSS has been established. Such a task 
clearly requires a more general notion than the C-derivation. For this purpose, 
the concept of K-derivation, formalised in Definition 7 below, was introduced in 
[12] and conjectured to provide such a characterisation. 

Definition 7 (K-derivation). Let T and K be theories, and let D be a clause. 
Then a K-derivation of D from T with respect to K is a tree derivation of D 
from T U K such that each clause k £ K (but not in T) is the generator of at 
most one input clause, which is called a k-input clause. Clause D is said to be 
K-derivable from T with respect to K , denoted (T, K) bfc D, iff there exists a 
K-derivation of D from T with respect to K. 

The notion of K-derivation generalises that of C-derivation in the following 
way. Whereas the C-derivation refers to a clause C that may be used at most 
once, in a K-derivation there are a set of clauses K each of which may be used at 
most once. A C-derivation is therefore a special case of a K-derivation, in which 
this set K = {C} is a singleton. 

In the next two sections, the semantics of KSS is extended to general clausal 
logic and a refinement of the K-derivation, called a K*-derivation, is introduced 
in order to provide a precise characterisation of KSS in the general case. The 
soundness and completeness results mentioned above are also lifted to the general 
case and the conjecture is proved. 
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3 Kernel Set Semantics for General Clausal Logic 



In this section the semantics of KSS is generalised from Horn clauses to arbitrary 
clauses. It is shown in this general case that KSS remains sound with respect to 
IE and continues to subsume the semantics of BG. First the notion of normalisa- 
tion is generalised in Definition 8, and two key properties are shown in Lemma 1. 
Then the generalised notion of Kernel Set is formalised in Definition 9. 

Definition 8 (Normalisation). Let B be a theory, let e = Pi V...VP n V-dVi V 
... V ->iV m be a clause, and let a be a Skolemising substitution for e. Then the 
normalisation of B and e (using a), consists of the theory B = B U {N\a A ... A 
N m a}, and the clause e = P\a V ... V P„<r. 

Lemma 1. Let B and e be the result of normalising a theory B and a clause 
e = Pi V ... V P n V — >iVi V ... V ->iV m using a. Let e denote the complement of e, 
also using a. Then (1) BUT \= e iff B U eUT |= □ for all theories T, and (2) 
B U H \= e iff B U H \= e for all (Skolem-free) theories H . 



Proof. Taking each case in turn: 



1. BUT\=eiSBU {iVier A ... A N m a} U T \= Pi<j V ... V P n a (by Definition 8), 

iff B U {Nia A ... A N m a A ->Picr A ... A U T \= □ (by the Entailment 

Theorem), iffPUe'UT|=n (by properties of complementation). 

2. BUH \= e iff BU{Nicr /\.../\N m a}UH \= PiaV ...V P n a (by Definition 8), iff 
B U H \= Pi a V ... V P„tr V ->IVic7 V ... V ~^N m a (by the Entailment Theorem), 
iff P U P |= ecr (by properties of substitution), iff B U H \= e (the forward 
direction uses the fact a binds each variable in e to a constant not in B, H 
or e, the reverse direction uses the fact ecr is an instance of e). 



Definition 9 (Kernel Set). Let B and e be the result of normalising a theory 
B and a clause e. A Kernel Set of B and e is a ground theory K. that can be 
written in the form: 



' Aj V A} V VAf 1 

lc = { A? V Aj V • • • Aj- ■ ■ V A™‘ 
. A? V Ai V • ■ ■■ V A™" 



where PU{A5a...AA°} |= e, and PU{A^} \= e for all 1 < i < n and 1 < j < mi. 
In this case, the literals A) 1 , . . . , A° are called the key literals of KL. 



In moving from the Horn case to the general case, the role previously played 
by the head atoms a* is now played by the so-called key literals Although any 
literal in a clause can be chosen as the key literal, for notational convenience it 
will be assumed that the key literal of a kernel clause ki will always be denoted 
A° . As shown in Definition 9 above, for /C to be a Kernel Set of B and e the key 
literals A°, . . . , A„ must collectively entail e with respect to B, and the non-key 
literals Aj must individually entail e with respect to B. 



Generalised Kernel Sets for Inverse Entailment 



171 



It is straightforward to show any Horn theory /C that is a Kernel Set by 
Definition 5 is also a Kernel Set by Definition 9. From Definition 5 it holds 
BVJ{a\ A. . .Aa„} \= e and B |= Sj. From the latter it follows BU{->5{} \= _L |= e. 
Upon identifying each key literal A° with the head atom a* and each non-key 
literal Aj with the negated body atom -> 6j it follows that H U {A° A ... A A° } \= e 
and B U {A^} |= e. Hence K. is also a Kernel Set by Definition 9. 

As formalised in Definition 10 below, the notion of KSS is the same in the 
general case as in Horn case. As before, a hypotheses H is derivable by KSS from 
B and e whenever it ^-subsumes a Kernel Set of B and e. The only difference 
is that general clauses are now used in place of Horn clauses, and the general 
Kernel Set replaces the Horn Kernel Set. As shown in Theorems 1 and 2, the 
key results from the Horn clause case apply also in the general case. 

Definition 10 (KSS). Let B and H be theories, and e be a clause. Then H 
is derivable by KSS from B and e iff H □ /C for some Kernel Set K. of B and e. 

Theorem 1 (Soundness of KSS wrt IE). Let B and H be theories, let e 
be a clause, and let K. = {k\ A ... A k n } be a Kernel Set of B and e. Then 
H □ /C implies B U H \= e. 

Proof. By Definition 9 it holds that B U {A° A ... A A°} \= e. Therefore B U 
~e U {A° A ... A A°} |= □ by Lemma 1. Hence B U e |= -A? V ... V ^A° by the 
Entailment Theorem. If Ad is any model of B U e then for some 1 < * < n it 
follows Ad falsifies the key literal A°. But, by an analogous argument, it also 
follows from Definition 9 and Lemma 1 that B Ue |= ->A{ for all 1 < i < n 
and 1 < j < TOj. Hence Ad also falsifies all of the non- key literals Al, . . . , A™’. 
Therefore Ad falsifies the Kernel clause k, = A° V A* V ... V A" H and also, therefore, 
the Kernel theory K. = {ki A. . -Ak n }. Consequently, HUe'U/C |= □. Since H □ 1C, 
it follows H \= K, and soi?UeUiL|=n. Therefore B U H |= e by Lemma 1 part 
(1) and hence B U H |= e by Lemma 1 part (2). 

Theorem 2 (KSS Extends BG). Let B be a theory, let e = Pi V ... V P n V 
-iJVi V ... V ~<N m be a clause, let Bot(B,e) be the Bottom Set of B and e using 
a, and let h = Lq V ... V L p be a clause. Then a clause h is derivable by BG from 
B and e only if the theory H = {h} is derivable by KSS from B and e. 

Proof. Suppose h is derivable by BG from B and e. Then h Bot(B , e) by 
Definition 2, and so h6 C Bot(B , e) for some 9. By Definition 1 it holds B U 
H |= Lid for all 0 < i < p. Since Lid is a ground atom, Lid = -^Lfi and so 
BUell {Li9} \= □ by the Entailment Theorem. Consequently, B U {Li9} \= e 
by Lemma 1. By Definition 9 the theory K, = {Lq9 V L\9 V ... V L p 0} is a single 
clause Kernel Set of B and e. By construction {h} □ /C and hence it follows by 
Definition 10 that H = {h} is derivable by KSS from B and e. 

To show KSS is stronger than BG, simply let B = {p V ->g(a) V -^q(b)} and 
e = p. In this case the three hypotheses {(/(x)} and {< 7 ( 0 ) A q(b)} and {p} are 
all derivable by KSS, but only the last one is derivable by BG from B and e. 
In order to illustrate the ideas presented above and to show how KSS can also 
derive non-Horn theories, this section now concludes with Example 1 below. 
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Example 1. Let the background theory B represent the knowledge that anyone 
in a bar may be served a drink unless he is a child , and that anyone in a cafe 
may be served a drink. Let the example e denote the fact that all adults may 
be served a drink. 



{ drink V child V -i bar 
drink V -i cafe 



e = drink V -> adult 



Then the hypothesis H shown below, which states that an adult will go either 
to the cafe or to the bar , and that no one is both an adult and a child , is correct 
with respect to IE since it can be verified that B U H |= e. 



H = 



bar V cafe V -> adult 
->child V -i adult 



Using the abbreviations a = adult, b = bar, c = child, d = drink, f = cafe it 
can be verified that the result of normalising B and e consists of the theory B 
and the clause e shown below, and that the following sequents are true. 



{ d V c V -<b 'l 

r ; ) 



e = d 



B U {b A — >c} |= e 
B U {^a} |= e 



Therefore, by Definitions 9 and 10, theory H is a Kernel Set of B and e, 
with key literals b and ->c, and H is derivable by KSS from B and e. But note 
Bot(B, e ) = drink V cafe V -i adult and so neither clause in H is derivable by BG. 

4 Characterisation in Terms of K*-Derivations 

This section provides a sound and complete characterisation of Kernel Sets 
in terms of a new derivation, called a K*- derivation. The K*-derivation is at 
the same time a generalisation of the C-derivation and a refinement of the K- 
derivation. Where a K-derivation requires that each clause in a set K is used at 
most once, a K*-derivation imposes one additional restriction on the way such 
clauses are used. The basis of this restriction is a new notion, called T-reduction, 
formalised in Definition 11 below and illustrated in Fig 1. 

Definition 11 (T-reduction). LetT be a theory, and let C and D be clauses. 
Then a T-reduction of C to D is a C-derivation TZ of D from T with respect to 
C such that TZ = TZ 1 + ... + TZ m + (C=C°, ...,C m =D) where for all 1 < i < m 
each 1 Z l is a tree derivation of a clause E 1 from T, and C 1 is a resolvent of C 1 ^ 1 
with a unit factor of E l . The clause C is said to be reduced to D by T in TZ. 
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Fig. 1. T-reduction 



Fig. 2. Abbreviation of Fig 1 



Informally, T-reduction is the process of progressively resolving a clause C 
with clauses derived from a theory T. Fig 1 illustrates the T-reduction of a 
ground clause C = C° = A 0 V A 1 V ... V A m to the ground literal A 0 = C m = D. 
Each descendent C* = A 0 V A l+1 V ... V A m of C differs from predecessor C’ _1 in 
the removal of a literal A 1 , which is ‘resolved away’ by a clause E l derived from 
T. Each wedge denotes the tree derivation 7Z l of the clause E l shown at the base 
of the wedge, from the theory T shown at the top. 

To simplify the representation of the T-reduction of a ground clause to a 
single literal, it is convenient to introduce the graphical abbreviation shown in 
Fig 2. Just as in Fig 1, the wedges denote the tree derivations 1Z 1 of the clauses 
E l . But now, instead of the clause E z , the complementary literal A 1 appears at 
the base each wedge. The black tip of each wedge emphasises that it is not the 
literal A 1 which is derived, but the clause E l that resolves away A*. Intuitively, 
this graphic shows the literals of C being resolved away until only A 0 remains. 

The notion of T-reduction is now used to define the concept of K*-derivation. 
As formalised in Definition 12, a K*-derivation consists of a principal derivation , 
which is a tree derivation of some clause Co from T, and zero or more reduction 
trees , in each of which a k-input clause k.-, is reduced to a clause Di by T. Clause 
C 0 is then reduced to D in the ‘tail’ (Ci, ..., C n ) of the derivation, by resolving 
with each of the Di in turn. For an example, the reader is referred to Fig 5. 

Definition 12 (K*-derivation). Let T and K be theories, and let D be a 
clause. Then a K*-derivation of D from T with respect to K is a K-derivation 
TZ of D from T with respect to K such that TZ = 7Zq+7Zi + ...+7Z n +(Ci , ..., C n =D) 
where TZq is a tree derivation, called the principal derivation, of a clause Co from 
T, and for all 1 < i < n each 7 Zi is a T-reduction, called a reduction tree, of a 
k-input clause ki £ K to a clause Di, and each Ci is the resolvent of Ci- \ with 
a unit factor of Di. The clause D is said to be K*-derivable from T with respect 
to K, denoted (T,K) \~k* D, whenever such a derivation exists. 

The rest of this section shows how the K*-derivation provides a sound and 
complete characterisation of KSS. First, Lemma 2 demonstrates an important 
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relationship between abduction and K*-derivations. Informally, this result states 
that all of the ground facts used in a refutation, need be used just once, at the 
very end. Such a refutation TZ is shown in Fig 3, where the principal derivation 
TZo is denoted by the wedge, and Aj , . . . , A', are the ground facts used. 

Lemma 2. Let T be a theory , and let A = {Ai A ... A A„} be a ground unit 
theory. Then T U A \= □ implies (T, A) h/.* □ with a K*-derivation of the form 
1Z = TZ 0 + (Aj, ..., Ap) + (Ci, ..., C P =D) for some {A' : A ... A A p } C A. 

Proof. Suppose that T U A |= □. Then by the Entailment Theorem T \= D 
where D = Ai V ... V A^. If D is a tautology, then A contains two complementary 
unit clauses, which means there is a trivial tree derivation of □ from these two 
unit clauses, and so (T, A) \~k * □ by Definition 12. If D is not a tautology, then 
by the Subsumption Theorem there is a tree derivation TZq, from the theory T 
of a clause Co such that Cq9 C D for some substitution 9. Now, define the set 
A 1 = {Ajj-.-jAp} such that A' = Cq9 n A. (i.e. A' is the subset of A whose 
literals are complementary to those in Co under 9). Next, let (Ci, . . . , C P =D) 
be the (unique) sequence of clauses Ci = Mf V M] V ... V Mf 1 such that each 
clause Cj+i £ (Ci, . . . , C p ) is the resolvent of Cj and A' +1 on the factor Cifa of 
Ci where fa is the mgu of the set Si = {Mf £ Ci | M\9 = A- +1 }. (i.e. Si is 
the subset of Ci whose literals are complementary to A- +1 under 9). Then, by 
Definition 12, it follows TZ = TZq + (Aj , ..., X' p ) + (Ci, ..., C P =D) is a K*-refutation 
from T with respect to A, with principal derivation TZq and trivial reduction 
trees where each TZ t = (A() simply contains the unit clause \[ £ A (see Fig 3). 

Lemma 2 is now used in Theorem 3 to show that a theory K. is a Kernel Set 
of B and e if and only if there exists a K*-refutation from B U e with respect to 
K.. Informally, as illustrated in Fig 4, a K*-derivation can always be constructed 
in which (zero or more) Kernel clauses are reduced by the theory B U e to their 
key literals. The key literals are then used one by one in the tail of the derivation 
to resolve away the clause Co derived in the principal derivation. 

Theorem 3 (K*-derivations characterise Kernel Sets). Let B and e be 

the result of normalising a theory B and clause e using a. Let e be the comple- 
ment of e, also using a. Let K, = {k\ A ... A k n } be a theory of ground clauses 
written kt = A° V A^ V ... V A("‘ , and let A = {A) 1 A ... A A°} . Then the theory 
K. is a Kernel Set of B and e iff (B U e,/C) \~k* □ 

Proof. Taking the “if” and “only if” cases individually: 

1. Suppose ( B U e,/C) I -/-* □. Then by Definition 12 for some 0 < p < n there 
is a K*-derivation 7 Z = TZq + TZi + ... + 7 Z p + (Ci, ...,C P =D) with principal 
derivation 7 Zq and reduction trees 7Z±, . . . ,TZ p . By Definition 11 it follows 
for all 1 < * < p that TZ z = 1Z\ + ... + 7 + (ki=C°, ...,Cf 4 = A') is a T- 
reduction of a ground k-input clause fcj £ K. to single literal A' £ ki by 
the theory B U e. Without loss of generality, assume K. has been written so 
K. = {ki A ... A k p A ... A k n } and ki = A° V \\ V ... V A™ 4 with A' = A° for all 
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Fig. 3. Abductive Derivation Fig. 4. K*-derivation 



1 <i < p. Therefore 1Z' = 7^o + (A?, A p ) + (Ci, C P =D) is a tree refutation 
from B U e U A. Hence BUeUZi^Dby the soundness of resolution, and 
so B U A |= e by Lemma 1. Now, by Definition 11 each 1Z\ with 1 < i < p 
and 1 < j < m-i is a tree derivation of a clause D 7 from B Ue such that 
Dj resolves away A^. Therefore 1Z l + (Aj) + (□) is a tree refutation from 
B UeU{Aj}. Hence BUeU{A^} \= □ by soundness of resolution, and so 
B U { A| } 1= e by Lemma 1. Since B U A \= e and B U {A7} (= e it follows by 
Definition 9 that /C is a Kernel Set of B and e with key literals A. 

2. Suppose /C is a Kernel Set of B and e with key literals A. By Definition 9 it 
follows BU A j= e. Consequently, BUeUd|=Dby Lemma 1. By Lemma 2 
there is a K*-refutation 1Z = TZ 0 + ( A' r , ..., A' p ) + (C\, ..., C p = □) from B Ue for 
some {A^ A ... A A p } C A (see Fig 3). Without loss of generality, assume K, 
has been written so K, = {k\ A ... A k p A ... A k n } and ki = A° V A l V ... V A™ 4 
with A' = A° for all 1 < i < p. By Definition 9 for each 1 < j < rrii it follows 
HU{A^} |= e. Consequently HUe'U{A^} (= □ by Lemma 1. By Lemma 2 there 
is a K*-refutation 7 Zj + (Aj) + (□) in which Aj may or may not be used. If Aj 
is not used, then by Definition 12 it trivially follows (B Lie, 1C) \~k* □ and the 
theorem is proved. If Aj is used, then by Definition 12 it follows IZj is a tree 
derivation from B Li e of a clause E? that resolves away Aj . Now, for all 1 < 
i < p and 0 < j < rrii define the clause Cf = A° V ... V Aj and the derivation 
IZi = TZ} + ...+TZ 7 l li + (ki=C " li , ..., Cj,C^=A{). Then by Definition 11 it follows 
TZi is a tree derivation in which ki is T-reduced to Aj by B U e. Hence by 
Definition 11 the tree derivation 7 Z' = TZq +TZ\ + ... + TZ P + (C\, ..., C P =D) is a 
K*-refutation from BUe with respect to /C (see Fig 4). Thus (BUe, 7C) bfc* □ 
by Definition 12. 

To characterise the hypotheses derivable by KSS in terms of K*-derivations, 
one complication must be addressed. Given that H □ K., it is possible for one 
clause h £ H to 0-subsume more than one clause in 1C, so that more than 
one instance of h is needed to derive □ from B L) e and H. For example, let 
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B = {p V ->q(a) V ~>q(b)} and e = p. Then hypothesis H = p(X) subsumes the 
Kernel Set K. = {q{a) A q(b)}, and two instances of H are required. 

One way of handling this complication is by treating the hypothesis H as a 
multi-set , and treating the relation H □ K, as an injection that maps each clause 
k in set JC to a clause h in multi-set H such that h k. Then it can be shown 
that a theory H is derivable by KSS from a theory B and a clause e, if and only 
if there is a K*-refutation from B U e with respect to H. For completeness the 
proof of this result is sketched in Corollary 1 below. 

Corollary 1. Let B be a set of clauses, let e be a clause, and let H be a multi-set 
of clauses. Then H is derivable by KSS from B and e iff (B U e, H) h/-* □ 

Proof. (Sketch) 

1. Suppose H is derivable by KSS from B and e. Then there is a theory K, such 
that H □ K, and /C is a Kernel Set of B and e. By Theorem 3 there is a 
K*-refutation from B U e with respect to 1C. Now replace each reduction tree 
of a k-input clause k by the reduction tree of a fresh variant of the clause h 
to which k is mapped by □ . After appropriate syntactic changes, the result 
is a K*-refutation from B U e with respect to H . 

2. Suppose such a K*-derivation exists. Replace each reduction tree of a k- 
input clause h by the reduction tree of a ground instance of k of h consistent 
with the substitutions in the derivation. After appropriate syntactic changes, 
the result is a K*-refutation from B U e with respect to the clauses k. By 
Theorem 3 this set of clauses is a Kernel Set of B and e, and by construction 
it is 0-subsumed by H. Therefore H is derivable by KSS from B and e. 

In order to illustrate the concepts introduced above, this section concludes 
by presenting a full K*-derivation for Example 1. As shown in Fig 5, the reduc- 
tion trees of the two underlined hypothesis clauses to the literals b and ->c are 
indicated by the dashed triangles. The principal derivation of the clause c V —>b 
containing the complements of these literals is marked by the dashed rectangle. 
The tail of the derivation is shown by the dashed ellipse. Finally, Fig 6 shows how 
this derivation is abbreviated using the compact notation introduced in Fig 2. 



5 Related Work 

The semantics of KSS is aimed ultimately at extending the principles of BG in 
order to provide a more general context for developing practical proof procedures 
for IE. However, it can also be used as a means of systematically comparing ex- 
isting methods for BG and KSS. In particular, as shown in the previous section, 
independently of how a hypothesis H is actually computed from B and e, there 
will always be an associated K*-refutation from B U ~e with respect to H. Con- 
sequently, existing methods can be classified in terms of the restrictions needed 
on the principal and reduction derivations in order to characterise the class of 
derivable hypotheses. Several methods are now compared in this way. 
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Fig. 5. K*-derivation of Example 1 



Fig. 6. Abbreviation of Fig 5 



Progol^ [6] is one of the best known and widely applied systems in ILP. This 
procedure uses a methodology called Mode Directed Inverse Entailment (MDIE) 
that efficiently implements BG the use of user-specified language bias. A subset 
of the Bottom Set is constructed by an SLD procedure and is then generalised 
by a lattice based search routine. Like all BG approaches, Progol4 induces only a 
single clause for each example, and like most existing procedures it is restricted 
to Horn clause logic. The hypotheses derivable by Progol4 are associated with 
K*-derivations of a simple form. The principal derivation always consists of a 
single unit clause containing the (unique) negative literal from e; and this literal 
is always resolved away by a single reduction tree in which the (only) hypothesis 
clause is reduced to an instance of its head atom. 

Progol5 [7] is the latest member of the Progol family. This proof procedure 
realises a technique called Theory Completion by Inverse Entailment (TCIE) 
that augments MDIE with a reasoning mechanism based on contrapositive lock- 
ing [13]. Although the principal derivations associated with Progol5 hypotheses 
also result in a negative unit clause, unlike Progol4 they may involve the non- 
trivial derivation of a negative literal distinct from that in e. However, due to 
an incompleteness of the contrapositive reasoning mechanism identified in [12], 
no merging of literals may occur within the principal derivation. In the corre- 
sponding reduction tree, the single hypothesis clause is reduced to an instance 
of its head atom. 

HAIL [11] is a recently proposed proof procedure that extends TCIE with the 
ability to derive multiple clause hypotheses within the semantics of KSS. This 
procedure is based on an approach called Hybrid Abductive-Inductive Learning 
that integrates explicit abduction, deduction and induction, within a cycle of 
learning that generalises the mode-directed approach of Progol5. Key literals 
of the Kernel Set are computed using an ALP proof procedure [5], while non- 
key literals are computed using the same SLD procedure used by Progol. Like 
Progol5, HAIL is currently restricted to Horn clause logic, but unlike Progol, 
the hypotheses derivable by HAIL can give rise to K*-derivations in which there 
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is no restriction on merging, and where the principal derivation may result in 
a negative clause with more than one literal. Each of these literals is resolved 
away by a corresponding reduction tree, in which one of the hypothesis clauses 
is reduced to an instance of its head atom. 

The proof procedures discussed above use the notions of Bottom Set or Ker- 
nel Set to deliberately restrict their respective search spaces. This is in contrast 
to some other recent approaches that attempt to search the complete IE hy- 
pothesis space. A technique based on Residue Hypotheses is proposed in [16] for 
Hypothesis Finding in general clausal logic. In principle, this approach subsumes 
all approaches for IE - including BG and KSS - because no restrictions are placed 
on B, H or e. But, in practice, it is not clear how Residue Hypotheses may be 
efficiently computed, or how language bias may be usefully incorporated into the 
reasoning process. An alternative method, based on Consequence Finding in full 
clausal logic, is proposed in [3] that supports a form of language bias called a 
production field and admits pruning strategies such as clause ordering. However, 
this procedure is still computationally expensive and has yet to achieve the same 
degree of practical success as less complete systems such as Progol. 



6 Conclusion 

In this paper the semantics of KSS has been extended from Horn clauses to 
general clausal logic. It was shown in the general case that KSS remains sound 
with respect to the task of IE and that it continues to subsume the semantics of 
BG. In addition, an extension of Plotkin’s C-derivation, called a K*-derivation, 
was introduced and shown to provide a sound and complete characterisation of 
the hypotheses derivable by KSS in the general case. These results can be seen 
as extending the essential principles of BG in order to enable the derivation 
multiple clause hypotheses in general clausal logic and thereby enlarging the 
class of soluble problems. 

The aim of this work is to provide a general context in which to develop 
practical proof procedures for IE. It is believed such procedures can be developed 
for KSS through the integration ALP and ILP methods and the efficient use of 
language bias. A hybrid ALP-ILP proof procedure has been proposed in [12] for 
computing multiple clause hypotheses, but currently this procedure is restricted 
to Horn clauses and has not been implemented. To address these issues, efficient 
abductive and inductive procedures are required for general clausal logic. One 
promising approach would be to adapt the work already begun by [1] and [3] in 
the context of semantic tableaux and to apply them in the context of KSS. 
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Abstract. The extended answer set semantics for logic programs allows for the 
defeat of rules to resolve contradictions. We propose a refinement of these se- 
mantics based on a preference relation on extended literals. This relation, a strict 
partial order, induces a partial order on extended answer sets. The preferred an- 
swer sets, i.e. those that are minimal w.r.t. the induced order, represent the solu- 
tions that best comply with the stated preference on extended literals. In a further 
extension, we propose linearly ordered programs that are equipped with a linear 
hierarchy of preference relations. The resulting formalism is rather expressive 
and essentially covers the polynomial hierarchy. E.g. the membership problem 
for a program with a hierarchy of height n is X'^’ +1 -complete. We illustrate an 
application of the approach by showing how it can easily express hierarchically 
structured weak constraints, i.e. a layering of “desirable” constraints, such that 
one tries to minimize the set of violated constraints on lower levels, regardless of 
the violation of constraints on higher levels. 



1 Introduction 

In answer set programming (see e.g. [3,22]) one uses a logic program to modularly 
describe the requirements that must be fulfilled by the solutions to a problem. The so- 
lutions then correspond to the models (answer sets) of the program, which are usually 
defined through (a variant of) the stable model semantics! 19]. The technique has been 
successfully applied in problem areas such as planning[22, 13, 14], configuration and 
verification [27,28], diagnosis [12, 31], game theory] 11], updates[15] and database re- 
pairs [2, 29]. 

The traditional answer set semantics is not universal, i.e. programs may not have 
any answer sets at all. While natural, this poses a problem in cases where, although 
there is no exact solution, one would appreciate to obtain an approximate one, even 
if it violates some of the rules. E.g. in an over-constrained timetabling problem, an 
approximate solution that ignores some demands of some users may be preferable to 
having no schedule at all. 
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The extended answer set semantics from [29, 30] achieves this by allowing for the 
defeat of problematic rules. Consider for example the rules a b <— and — a <— b. 
Clearly, these rules are inconsistent and have no classical answer set, while both {a, b} 
and {^a, 6} will be recognized as extended answer sets. In {a, b}, ->a <— b is defeated 
by a <— while in {->a, b}, a <— is defeated by ->a <— b. 

In this paper, we extend the above semantics by equipping programs with a prefer- 
ence relation over extended literals (outcomes). Such a preference relation can be used 
to induce a partial order on the extended answer sets, the minimal elements of which 
will be preferred. In this way, the proposed extension allows one to select the more 
appropriate approximate solutions of an over-constrained problem. 

Consider for example a news redaction that has four different news items available 
that are described using the following extended answer sets: 

N\ = {local, politics} 

N 2 = {local, sports} 

N 3 = { national , economy} 

N 4 = { international , economy} . 

The redaction wishes to order them according to their preferences. Assuming that, re- 
gardless of the actual subject, local news is preferred over national or international 
items, the preference could be encoded as 1 

local < national < international < {economy, politics, sports} . 

Intuitively, using the above preference relation, N\ and N 2 should be preferred upon 
N 3 , which should again be preferred upon N 4 , i.e. iV-| , N 2 E N 3 E N&. 

In the above example, only one preference relation is used, corresponding to one 
point of decision in the news redaction. In practice, different journalists may have con- 
flicting preferences, and different authorities. E.g., the editor-in-chief will have the final 
word on which item comes first, but she will restrict herself to the selection made by 
the journalists. Suppose the editor-in-chief has the following preference 

economy < politics < sports < {local, national, international} . 

Applying this preference to the preferred items N\ and N 2 presented by the journalists 
yields the most preferred item N 3 . 

Such hierarchies of preference relations are supported by linearly ordered pro- 
grams, where a program is equipped with an ordered list of preference 

relations on extended literals, representing the hierarchy of user preferences (<, has 
a higher priority than <i+i). Semantically, preferred extended answer sets for such 
programs will result from first optimizing w.r.t. < 1 , then selecting from the result the 
optimal sets w.r.t. < 2 etc. Obviously, the order in which the preference relations are 
applied is important, e.g. exchanging the priorities of the preference relations in the 
example would yield N 3 as the preferred news item. 

1 We use a < X, with X a set, as an abbreviation for {a < x \ x € X}. 
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It turns out that such hierarchically layered preference relations are very expressive. 
More specifically, we show that such programs can solve arbitrary complete problems 
of the polynomial hierarchy. 

In [8], weak constraints are introduced as a type of constraint 2 that is “desirable” but 
may be violated if there are no other options, i.e. violations of weak constraints should 
be minimized. The framework also supports hierarchically structured weak constraints, 
where constraints on the lower levels are more important than constraints on higher 
levels. Mirroring the semantics for linearly ordered programs, solutions minimizing the 
violation of constraints on the lowest level are first selected, and, among those, the 
solutions that minimize the constraints on the second level are retained, continuing up 
to the highest level. Weak constraints are useful in areas such as planning, abduction and 
optimizations from graph theory[16, 10]. It will be shown that hierarchically structured 
weak constraints can be easily captured by linearly ordered programs. 

The remainder of the paper is organized as follows. In Section 2, we present the ex- 
tended answer set semantics together with a preference relation on extended literals and 
illustrate how it can be used to elegantly express common problems. Section 3 intro- 
duces linearly ordered programs, the complexity of the proposed semantics is discussed 
in Section 4. Before concluding and giving directions for further research in Section 6, 
we show in Section 5 how weak constraints can be implemented using linearly ordered 
programs. 

2 Ordered Programs 

We use the following basic definitions and notation. A literal is an atom a or a negated 
atom —a. For a set or literals X, —•X denotes {->a | a £ X} where ->->a = a. X is 
consistent if X fl —X = 0. An interpretation I is a consistent set of (ordinary) literals. 

An extended literal is a literal or a naf-literal of the form not l where / is a literal. 
The latter form denotes negation as failure. For a set of extended literals X, we use X~ 
to denote the set of ordinary literals underlying the naf-literals in X, i.e. X~ = {/ | 
not l £ A'}. An extended literal l is true w.r.t. an interpretation I, denoted I [= l if Z £ I 
in case l is ordinary, or a ^ I if l = not a for some ordinary literal a. As usual, I \= X 
for some set of (extended) literals l iff \/l £ X ■ I \= l. 

An extended rule is a rule of the form a <— (3 where a U (3 is a finite set of extended 
literals 3 and |a| < 1. An extended rule r = a <— (3 is satisfied by I, denoted I \= r, 
if I \= a and a / 0, whenever I \= j3, i.e. if r is applicable (I |= /3), then it must be 
applied (/ |= a U f3 A a / 0). Note that this implies that a constraint, i.e. a rule with 
empty head ( a = 0), can only be satisfied if it is not applicable (/ /3). 

A countable set of extended rules is called an extended logic program (ELP). The 
Herbrand base Bp of an ELP P contains all atoms appearing in P. Further, we use C p 
and C* P to denote the set of literals (resp. extended literals) that can be constructed from 
Bp, i.e. Cp = Bp U ~^Bp and C* P = Cp U {not l \ l £ Cp}. For an ELP P and an 
interpretation I we use Pi C P to denote the reduct of P w.r.t. I, i.e. Pi = {r £ P \ 

2 A constraint is a rule of the form <— a, i.e. with an empty head. Any answer set should 
therefore not contain a. 

3 As usual, we assume that programs have already been grounded. 
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I |= r}, the set of rules satisfied by I. We call an interpretation I a model of a program 
P if Pi = P, i.e. / satisfies all rules in P. It is a minimal model of P if there is no 
model J of P such that J C I. 

A simple program is a program without negation as failure. For simple programs 
P, we define an answer set of P as the minimal model of P. On the other hand, for a 
program P containing negation as failure, we define the GL-reduct[l 9 ] for P w.r.t. I, 
denoted P 1 , as the program consisting of those rules (ct \ not a~ ) <— ( fi\not fi~ ) where 
a <— fi is in P, I |= not fir and I |= oT . 

Note that all rules in P 1 are free from negation as failure, i.e. P 1 is a simple pro- 
gram. An interpretation I is then an answer set of P iff / is a minimal model of the 
GL-reduct P 1 . An extended rule r = a <— fi is defeated w.r.t. P and I iff there exists 
an applied competing rule r' = a' <— fi' such that {a, a'} is inconsistent. An extended 
answer set for P is any interpretation I such that I is an answer set of Pj and each 
unsatisfied rule in P \ Pj is defeated. 

Example 1 . Consider the ELP P shown below. The program describes a choice be- 
tween speeding or not. Sticking to the indicated limit guarantees not getting a fine while 
speeding, when it is known that the police are carrying out checks, will definitely result 
in a fine. Finally, if nothing is known about checks, there is still a chance for a fine. 



speeding 
-< speeding 
check 
not check 
not fine 



maybe-fine 



fine 
maybe-fine 
-i fine 
fine 



speeding , check 
speeding , not check 
-i speeding 
maybe-fine 



The above program has five possible extended answer sets, which are Mi = {speeding, 
check, fine}, M2 = {speeding, maybe -fine}, M3 = {^speeding, check, -> fine}, 
M4 = {^speeding , ^ fine} and M5 = {speeding, maybe -fine, fine}. 



Unlike traditional answer sets, extended answer sets are, in general, not subset mini- 
mal, i.e. an ELP P can have extended answer sets M and N with M C N, as witnessed 
by M3 and M4 in Example 1 . Moreover, a program can have both answer sets and ex- 
tended answer sets (that are not answer sets). E.g. the ELP {a <— ; not a <— not a}, has 
two extended answer sets I = {a}, which is also a traditional answer set, and J = 0 , 
which is not. 

Often, certain extended answer sets are preferable over others. E.g., in Example 1 , 
one would obviously prefer not to get fined, which can be represented as a strict par- 
tial order <1 on literals: <1= *mfine < C* P \{^fine}. However, in an emergency, 
one may prefer to ignore the speed limit, resulting in an alternative preference rela- 
tion <2= {speeding < -> speeding < C; maybe-fine < fine < C}, where C = 
C* P \{speeding , -> speeding , maybe-fine, fine} . 

In general, we introduce, for an ELP P, a strict partial order 4 < on extended literals, 
such that, for two extended literals l\ and l2,h < h expresses that l\ is more preferred 

4 A strict partial order < on a set X is a binary relation on X that is antisymmetric, anti-reflexive 
and transitive. The relation < is well-founded if every nonempty subset of X has a <-minimal 
element. 



184 Davy Van Nieuwenborgh, Stijn Heymans, and Dirk Vermeir 



than 1 2 - This preference relation induces a partial order 5 Z on the extended answer sets 
ofP. 

Definition 1. An ordered program is a pair (P, <) where P is an ELP and < is a 
strict well-founded 6 partial order on C* P . For subsets M, N C C P , we define M Z N 
iff 'in € {l € C* P I N \= l A M £ 1} • 3m e {l e CA P \ M \= l A N ^ 1} ■ m < n. A 
preferred answer set of (P, <) is an extended answer set of P that is minimal w.r.t. Z 
among the set of extended answer sets of P. 

Intuitively, M is preferable over N, i.e. M Z N, if every literal n from N\M is 
“countered” by a “better” literal m < n from M\N. 

Applying the above definition on Example 1 with <i as described above, yields 
both M 3 and M4 as preferred answer sets, while M 2 is the only preferred answer set 
w.r.t. < 2 , which fits our intuition in both cases. 

In the sequel, we will specify a strict partial order over a set X using an expression 
of the form 

L = {xi < t/i, . . . ,x n < y n , z 1 , . . . , Zm } 

which stands, unless explicitly stated otherwise, for the strict partial order defined by 

{xi < yi \ 1 < i < n} 

U {xi <u\l<i<n A u € U} 

U {)/i < m | 1 < i < n A m £ [/} 

U {zj < u | 1 < j < m A u £ U} 

with U the set of elements not occurring in L, i.e. U = X\K with K = {x^ | 1 < i < 
n} U {yi | 1 < « < n} U {zj | 1 < j < to}. 

This notation implies that extended literals outside of L are least preferred and thus 
cannot influence the preference among extended answer sets. E.g. if L = {a < b} then 
both {a, c} Z {6} and {a} Z { b , c} (where M Z N iff M Z N and M N). 

Example 2. The program below offers a choice between a large and a small drink to go 
with spicy or mild food. 



large-drink <— not small-drink 
small-drink <— not large-drink 
spicy <— not mild 
mild < — not spicy 
large-drink <— spicy 

There are three extended answer sets: Mi= {large -drink, spicy}, M 2 ={large-drink , 
mild} and M 3 ={small -drink, mild}. 

A smaller drink is preferred and there is no particular preference between mild and 
spicy, yielding {small-drink < large-drink , mild , spicy} to describe the prefer- 
ences. The preference for a small drink causes M3 Z M 2 while Mi is incomparable 
with both M2 and M3. Thus both Mi and M3 are preferred. 

5 That Z is a partial order follows from Theorem 6 in [29]. 

6 It is easy to verify that, if < is empty, then MZAiff}/ | {l | M f= 1} which 

reduces to N = M, i.e. all extended answer sets are preferred. 
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There is no “one true way” to induce a preference relation C over extended answer 
sets from a particular ordering < over extended literals. E.g. [26], which deals with 
traditional answer sets of extended disjunctive programs, proposes a different method 
which is shown below, adapted to the current framework. 

Definition 2. For an ordered program (P, <) and subsets M, N C £ P , we define 
M Q s N iff 

- M = N (reflexive), or 

- 3 L ■ M C s L A L C s N (transitive), or 

- 3ei £ {I £ C* P | M |= l A N [A 1} such that 

3e2 G {/ G C* P | N |= l A M 1} ■ e± < e 2 
A —<3e3 G {/ € C* P | N |= l A M [A /} . e 3 < e\ . 

Obviously, C s is also a partial order. 

Theorem 1. Let (P, <) be an ordered program and let M and N be two extended 
answer sets of P. Then M IZ N implies that M C s N. 

The other direction is in general not true, as appears from the following example. 
Example 3. Consider the program P depicted below. 

shares <— not cash cash < — not shares 

$100 <— shares $ 1000 <— cash 

stock -options <— shares 

This program has two extended answer sets, i.e. M\ = {cash, $1 000} and M 2 = 
{shares, $100 , stock -options}. 

The preference <= {$1000 < $100}, where we take < as is, i.e. without applying 
the expansion from the previous page, expresses no preference between cash, shares and 
stock options, except that, obviously, a larger amount of cash is preferred over a smaller 
amount. Using the C s preference relation, we get Mi C s M 2 and M 2 % s Mi, while 
for the C relation both Mi % M 2 and M 2 % Mi holds. This makes Mi preferred w.r.t. 
C s , while both Mi and M 2 are preferred w.r.t. C. Note that, e.g. Mi % M 2 because 
Mi cannot counter the stock -options of M 2 by something more preferred. 

In general, C makes no decision between extended answer sets containing unrelated 
literals in their differences, while C s is more credulous, preferring e.g. $1000 over $100 
and some “unknown” stock -options . 

In the next section, the skeptical approach of C will turn out to be useful since it 
allows for new information to refine an earlier result. E.g., if, in the above example, it is 
later learned that the stock options provide great value, stock -options < $1000, might 
be added, possibly in a different preference relation, and used to prefer M 2 among the 
earlier “best choices” Mi and M 2 . 

From Theorem 1, the following is immediate. 

Corollary 1. Let R = ( P , <) be an ordered program. Then, the preferred answer sets 
of R w.r.t. C s are also preferred w.r.t. C. 



186 Davy Van Nieuwenborgh, Stijn Heymans, and Dirk Vermeir 



3 Linear n-Ordered Programs 

A strict partial order on literals, as defined in the previous section, is a powerful and 
flexible tool to express a wide range of preferences. However, in practice, it is some- 
times useful to have different layers of preferences, each applied in turn. As an example, 
consider the staff selection procedure of a company. Job applicants are divided into cer- 
tain profiles, e.g. either female or male, old or young, experienced or not. Further, it is 
believed that inexperienced applicants tend to be ambitious, which is captured by the 
following program. 

female <— not male male <— not female 

old <— not young young <— not old 

experienced <— not inexperienced inexperienced <— not experienced 

ambitious <— inexperienced 

The decision to hire a new staff member goes through a chain of decision makers. On 
the lowest, and most preferred, level, company policy is implemented. It stipulates that 
experienced persons are to be preferred over inexperienced and ambitious persons, i.e. 
<1= {experienced < {inexperienced, ambitious}}. On the second level, the financial 
department prefers young and inexperienced employees, since they tend to cost less, 
i.e. <2= {young < old, inexperienced < experienced} . On the last, weakest, level, 
the manager prefers a woman to enforce her largely male team, i.e. <3= {female < 
male}. 

In this example, any preferred extended answer set should be preferred w.r.t. D-j 
(induced by <1) among all extended answer sets and, furthermore, among the Ci- 
preferred sets, it should also be ^-preferred (where C 2 is induced by < 2 ). Finally, the 
preferred answer sets of the complete problem are the |Z 2 -preferred sets which are also 
C 3 -preferred (where C 3 is induced by < 3 ). 

Formally, we extend ordered programs, by allowing a linearly ordered set of pref- 
erence relations <1, -■-,<« for an ELP P, where <1 is the order with the highest 
priority. 

Definition 3. A linearly ordered program (LOLP) is a pair ( P , (<i ) i=1 n ) where P 
is an ELP and (<j) . 1 is a sequence of (strict partial order) preference relations 
< 1 , . . . , <„. Each of these orders <i induces a preference relation C, between extended 
answer sets, as in Definition 1. 

We define the preference up to a certain order of extended answer sets by induction. 

Definition 4. Let ( P , (<i)i= 1 n ) a LOLP. An extended answer M set is preferable 
up to <i, 1 < i < n, iff 

- i = 1 and M is preferred w.r.t. O it or 

- i > 1, M is preferable up to <i—i, and there is no N, preferable up to <i-\, such 
that N Cj M. 

An extended answer set M of P is preferred if it is preferable up to < n . 
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Continuing the above example, we have eight extended answer sets for the pro- 
gram, which are all preferable up to <o. After applying <1, only four of them are 
left, i.e. Mi = {experienced, old, female}, M 2 = {experienced, young, female}, 
M3 = {experienced, male, young} and M4 = {experienced, old, male}, which fits 
the company policy to drop inexperienced ambitious people. When <2 is applied on 
these four remaining extended answer sets, only M 2 and M 3 are kept as preferable up 
to < 2 . Finally, the manager will select M 2 as the only extended answer set preferable 
up to <3. 

Note that rearranging the chain of orders gives, in general, different results. E.g., 
interchanging <1 with <2 yields {young, female, ambitious, inexperienced} as the 
only extended answer set preferable up to <3- 

4 Complexity 

We first recall briefly some relevant notions of complexity theory (see e.g. [24, 3] for 
a nice introduction). The class P ( NP ) represents the problems that are determinis- 
tically (nondeterministically) decidable in polynomial time, while coNP contains the 
problems whose complement are in NP. 

The polynomial hierarchy, denoted PH, is made up of three classes of problems, 
i.e. A k , and 17'\ k > 0, which are defined as follows: 

1. = Eff = 77 0 p = P ; and 

2. A? +1 = P ^ , S£ +1 = NP Sk , Hff +1 = coEjf +1 . 

y-lP yjP 

The class P k (NP k ) represents the problems decidable in deterministic (nonde- 
terministic) polynomial time using an oracle for problems in Ejf, where an oracle is a 
subroutine capable of solving E k problems in unit time. Note that Z\f = P, Ef = NP 
and III = coNP. Further, it is obvious that E k C E k U Iljf C A k+l C E k+1 , but 
for k > 1 any equality is considered unlikely. Further, the class PH is defined by 

^ = ur=o^ p 

A language L is called complete for a complexity class C if both L is in C and L 
is hard for C. Showing that L is hard is normally done by reducing a known complete 
decision problem into a decision problem in L. For the classes H}' and 1 1 If with k > 0 
a known complete, under polynomial time transformation, problem is checking whether 
a quantified boolean formula (QBF) <fi is valid. Note that this does not hold for the class 
PH for which no complete problem is known unless P = NP. 

Quantified boolean formulas are expressions of the form QiXiQ 2 X 2 ■ . . QkXk ■ G, 
where k > 1, G is a Boolean expression over the atoms of the pairwise nonempty sets 
of variables Xi, ... , Xp- and the Qf s, for i = 1 , . . . , k are alternating quantifiers from 
{3, V}. When Qi = 3, the QBF is fc-existential, when Q 1 = V we say it is fc-universal. 
We use QBF kB ( QPF k v ) to denote the set of all valid /.'-existential (/c-universal) 
QBFs. 

Deciding, for a given /.'-existential (fc-universal) QBF 0, whether <j> £ QBF k d 
(0 £ QBF k v ) is a Xjf -complete (II jf -complete) problem. 

The following results shed some light on the complexity of the preferred answer set 
semantics for linear n-ordered logic programs. 
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First of all, checking whether an interpretation I is an extended answer set of an ELP 
P is in P, because (a) checking if each rule in P is either satisfied or defeated w.r.t. 
J, (b) applying the GL-reduct on Pi w.r.t. I , i.e. computing (Pi) 1 , and (c) checking 
whether the positive program (Pi) 1 has I as its unique minimal model, can all be done 
in polynomial time. 

On the other hand, the complexity of checking whether an extended answer set M 
is not preferable up to a certain <71 depends on n, as shown in the next lemma. 

Lemma 1. Let (P, (<i) i=1 n ) be a LOLP, and let M be an extended answer set of 
P. Checking whether M is not preferable up to < n is in E^. 

Proof The proof is by induction on n. 

The base case, i.e. n = 0, holds vacuously as checking whether M is an extended 
answer set is in P = Eq . 

For the induction step, checking that M is not preferable up to < n can be done by 
(a) checking that M is (or is not) preferable up to < n - i , which is in E ! r '_ x due to the 
induction hypothesis; and (b) guessing, if M is preferable up to < n _i, an interpretation 
N M and checking that it is not the case that N is not preferable up to <„_i, 
which is again in Ef _ 1 due to the induction hypothesis. As a result, at most two calls 
are made to a Ef'_ 1 oracle and at most one guess is made, yielding that the problem 
itself is in NP Sn - x = E%. □ 

Using the above yields the following theorem about the complexity of LOLPs. 

Theorem 2. Let (P, (<j) i=1 n ) be a LOLP and l a literal. Deciding whether there is 
a preferred answer set containing l is in E^ +1 . 

Proof. The task can be performed by an ./VP -algorithm that guesses an interpretation 
M 9 l and checks that it is not the case that M is not preferable up to level n. Due to 
Lemma 1, the latter is in E so the former is in NP Sn = E^ +1 . □ 

Theorem 3. Let (P, (<i) i=1 n ) be a LOLP and l a literal. Deciding whether every 
preferred answer set contains l is in PP - L . 

Proof. Due to Theorem 2, finding a preferred answer set M not containing l, i.e. I f M, 
is in E^ +1 . Hence, the complement of the problem is in ■ □ 

To prove hardness, we provide a reduction of deciding validity of QBFs by means 
of LOLPs. 

Theorem 4. The problem of deciding, given a LOLP (P, (<j) i=1 n ) and a literal l, 
whether there exists a preferred answer set containing l is E^ +1 -hard. 

Proof. (Sketch). Let <f> = ELY 1 VX 2 . . . QX n+ \ ■ G £ QBF n+1 3 , where Q = V if 
n is odd and Q = 3 otherwise. We assume, without loss of generality, that G is in 
disjunctive normal form, i.e. G = \J c ^cC where C is a set of sets of literals over 
X\ U ... U X n+ i and each c £ C has to be read as a conjunction. In what follows, we 
will write l <j X p . q to denote the longer {/ <j x ; l <j ~<x \ x £ Xj A p < j < q}. 
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The LOLP (P^, (<,) i=1 n ) corresponding to & is defined by the ELP P 0 : 

Pi : {x <— ; -ix <— \x£Xi/\l<i<n+l} 

Pi - {g <- c | c G C) 

P3 : sat <— g 

P4 : ~<sat <— not g 

and the sequence (<j) i=1 n of orders defined by 

{ 'Sat <Cn sat, g <^ n X2...n-\-ii X^]- 
{sat,g ^n — 1 'Sat <Cn— 1 Xi, X2 } 

{w <1 w' <1 X n+ i Xi, . . . , X n } 

where w = -<sat and w' = sat, g if n is odd; and w = sat, g and w' = -<sat otherwise. 

Obviously, the construction can be done in polynomial time. Intuitively, the rules 
in Pi are used to guess a truth assignment for A'i U . . . U X n . +1 . For each such truth 
assignment, the rules in P2 , P3 and P,\ will decide whether the formula G is valid or 
not. The intuition behind the orders is to prefer those extended answer sets of P 0 that 
give a counterexample to the validity of </> . Only when such an example does not exist, 
i.e. (j> is valid, an extended answer set containing the literal sat will be preferred. 

First note that an order relation <k= w < w' < X n _k+2...n+i can only prefer an 
extended answer set Mi upon M2, i.e. Mi \Zk M2, if Mi n (Xi U . . . U X„_fc+i) = 
M2 D (Xi U . . . U X n _fc + i); otherwise we have both Mi \£k Mi an d M2 \£k Mi. 
Further, when <k= sat, g < ~^sat < X n _k+2...n+i (respectively 
<k= ~^sat < sat,g < X n _k+2...n+i), then the (n — k + 2) th quantifier, denoted 
Q n _k+i in (j> is 3 (V respectively). 

In the sequel we use A 4 k , with 0 < k < n to denote the set of extended answer sets 
of P 0 that are preferable up to <&. We will show by induction that A 4 k only contains 
extended answer sets M £ M k with sat £ M iff Q n _ k + 2 ■ ■ ■ Qn + 1 • G is valid using 
x \i n ~ +1 ’ he- the truth combination over Xi U • • • U X n _k + 1 in M. 

The base case, i.e. k = 0 , holds vacuously, as we have, for each possible truth 
combination over Xj U • ■ • U X n+ i, an extended answer set M £ A 4 ° containing 
sat £ M if G is valid and ->sat £ M if G is not. 

For the induction step, suppose the claim holds for M. k ~ k and consider <t and 
Q n _k+i- When Q n _k+i = 3 , <k will prefer those extended answer sets in Xl fc_1 
containing sat for a fixed truth combination X over XiU. . .UA r n _/ i:+ i . By the induction 
hypothesis, we have that M £ Ai k ~ 1 with sat £ M iff Q n _k + 3 • ■ • Qn+i'G is valid for 
x]^' n ~ k+2 . Clearly, Qn-fc+2 . . . Q n + 1 • G is then valid for x]^' n ~ k+1 iff M k contains 
an extended answer set M with sat £ M. 

On the other hand, when Q n _k+i = V, <k will prefer those extended answer sets in 
A 4 k ~ x containing ->sat for a fixed truth combination X overXiU. . .UX n _fe+i. By the 
induction hypothesis, we have that M £ JPl k ~ 1 with ->sat £ M iff Q n _k + 3 • ■ ■ Qn + 1 • 
G is not valid for x]^' n ~ k+2 . Clearly, only when Q n _k + 3 ■ • ■ Qn + 1 • G holds for every 
combination of X n _k+i with X, no extended answer sets with -1 sat will be in A 4 k ~ 1 
for X, and all those with sat will be passed to M k , yielding that Q„-k+2 ■ ■ ■ Qn + 1 • G 
holds for Xm iff M £ M k with sat £ M. 
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Finally, by induction the above yields for Ad”, i.e. the preferred answer sets, which 
implies that <j> is valid iff Ad” contains a preferred answer set M containing sat, i.e. 
3 M € Ad” • sat £ M, from which the theorem readily follows. □ 

Theorem 5. The problem of deciding, given a LOLP (P, (<j) i=1 n ) and a literal l, 
whether every preferred answer set contains l is // r f +1 -hard. 

Proof. Reconsider the LOLP in the proof of Theorem 4. Let l be a fresh atom not 
occurring in Pq and define P'., as P, ; , with two extra rules l <— and -t Z <— . Clearly, 
showing that l does not occur in every preferred answer set is the same as showing that 
-il occurs in any preferred answer set. Deciding the latter is L' 7> f | -hard by Theorem 4; 
thus deciding the complement of the former is 7T,f +1 -hard. □ 

The following is immediate from Theorem 2, 3, 4 and 5. 

Corollary 2. The problem of deciding, given an arbitrary LOLP (P, (<j) i=1 n ) and 
a literal l, whether there is a preferred answer set containing l is ^complete. On the 

other hand, deciding whether every preferred answer set contains l is II^ +1 -complete. 



5 Weak Constraints 

Weak constraints were introduced in [8] as a relaxation of the concept of a constraint. 
Intuitively, a weak constraint is allowed to be violated, but only as a last resort, meaning 
that one tries to minimize the set of violated constraints. Here minimization is typically 
interpreted as either subset minimality or cardinality minimality. In the former, we pre- 
fer a solution that violates a set of weak constraints C\ over one that violates a set C 2 
iff Ci C C -2 , while in the latter, we would only need that C\ contains less violated 
constraints than C 2 , i.e. \C\\ < |C7 2 1 . 

Subset minimality is obviously less controversial since, for cardinality minimality, 
it may happen that, while \Ci\ < C- 2 | , C\ contains more important constraints than 

C 2 7 . 

In [8] a semantics for hierarchies of weak constraints is defined, where one mini- 
mizes constraints on lower levels, before minimizing, among the results of the previous 
levels, constraints on higher levels. Formally, weak constraints have the same syntac- 
tic form as constraints, i.e. <— A with f3 a set of extended literals. We then assign the 
weak constraints for a certain level i to a set Wi, similar to [8], and define a weak logic 
program as consisting of a program and a hierarchy of sets of weak constraints. 

Definition 5. A weak logic program (WLP) is a pair (P, W) where P is a program 
and W is a set {Wi , . . . , W n }, with each Wi, 1 < i < n, a set of weak constraints. 

To enhance readability of the following definition, we assume an empty dummy set 
Wq of weak constraints. 

7 A similar preference for subset minimality over cardinality minimality is also common in, 
for example, the domain of diagnosis [25, 31], where one tries to minimize the set of causes 
responsible for certain failures. 
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Definition 6. Let (P, W) be a WLP. The extended answer sets of P are preferable up 
to Wo- An extended answer set M of P set is preferable up to Wi, 1 < i < n, if 

- M is preferable up to Wj_i, and 

- there is no N, preferable up to Wi-\, such that Wf C Wfj, where W'f = {c \ 
c £ Wi A N c}, i.e. the constraints in Wi that are violated by N. 

An extended answer set of P is a preferred answer set of a WLP (P, W) if it is prefer- 
able up to W n . 

LOLPs can easily implement weak constraints. Intuitively, each order <j in the 
hierarchy will try to minimize the violation of weak constraints in Wi- 

For a WLP (P, {Wi, . . . , W n }), define the LOLP (P U WC , (<,) i=1 n ) with 
WC = {c <— [3 | c = (<— (3) £ Wi, 1 < i < n} representing the weak constraints by 
rules with new atoms c, one for each constraint <— (3, and each order <,; in (<f) i=1 
defined by 

{notCi <i Ci | Cj £ WJ. 

The orders prefer extended answer sets that do not contain c since c can only be 
obtained by applying c <— (3, corresponding to a violation of the corresponding original 
constraint (3 . 

Theorem 6. An extended answer set M of a WLP (P, W) is preferred iff M U {c | c £ 
W, M c} is a preferred answer set of the LOLP (P U WC, (<i) i=1 n ). 

The other approach to minimize the violation of weak constraints, is to take into 
account the cardinality of the sets of violated weak constraints, as in [8]. The following 
definition formalizes the notion of cardinality preferred, or c-preferred for short, answer 
sets. 

Definition 7. Let (P, W) be a WLP. The extended answer sets ofP are c-preferable up 
to Wo- An extended answer M of P set is c-preferable up to Wi, 3 < i < n, if 

- M is c-preferable up to Wi- 1 , and 

- there is no N, c-preferable up to Wi- 1 , such that \ Wf \ < \ Wf r \. 

An extended answer set of P is a c-preferred answer set of an WLP (P, W) if it is 
c-preferable up to W n . 

In the special case that the preferable answer sets on a level are c-preferable we 
have that, on the next level, the c-preferable answer sets are preferable. Denote the set 
of extended answer sets that are c-preferable up to Wi as M. l c and the set of extended 
answer sets preferable up to Wi as AP. 

Theorem 7. Let (P, W) be a WLP and let AP -1 = AP -1 for some 1 < i < n. Then 
M l c C AP. 

The pre-condition that every extended answer set, preferable up to Wi- 1 , has to be 
c-preferable is necessary, as can be seen from the following example, where we have a 
c-preferred answer set that is not preferred. 
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Example 4. Take a WLP (P, {Wi, W-i}) with P the program 

~<a < — 
a <— 
b <— a 

and the weak constraints W\ = {ci — >a ; ci a ; C 3 '.<— b} and the second level 
Wi = { c 4:<— -i a, not 6}. The program P has two extended answer sets M = {“>0} 
and N = {a, 6}. This leads to the following sets of violated constraints: Wjh = {ci}, 
W h = {c2, C3}, W^ = {04}, and Wfi = 0. Then, M is c-preferable up to W\, while 
N is not. Both M and N are preferable up to W\ . However, M is c-preferable up to 
W 2 , since there are no other extended answer sets that are c-preferable up to Wi, while 
M is not preferable up to I4f . 

If there is only one level of weak constraints we have the attractive property that 
c-preferred answer sets are preferred. 

Corollary 3. Let (P, {W\ }) be a WLP with one level of weak constraints. A c-preferred 
answer set of (P, {W\}) is preferred. 

In this case, if the preferred answer sets are already computed, and one decides later 
on that c-preferred answer sets are needed, the search space can be restricted to just the 
preferred answer sets instead of all extended answer sets. 

6 Conclusions and Directions for Further Research 

Equipping logic programs with a preference relation on the rules has a relatively long 
history [21,20, 18,9,7,5,32, 1,29]. Also approaches that consider a preference relation 
on (extended) literals have been considered: [26] proposes explicit preferences while [4, 
6] encodes dynamic preferences within the program. 

In this paper, we applied such preferences on the extended answer set semantics, 
thus allowing the selection of preferred “approximate” answer sets for inconsistent pro- 
grams. We also considered a natural extension, linearly ordered programs, where there 
are several preference relations. This extension increases the expressiveness of the re- 
sulting formalism to cover the polynomial hierarchy. 

Such preference hierarchies occur naturally in several application areas such as 
timetabling. As an application of the approach, we have shown that hierarchically struc- 
tured weak constraints can be considered as a special case of linearly ordered programs. 

Future work may generalize the structure of the preference relations, e.g. to arbitrary 
partial orders or to cyclic structures, where the latter may provide a natural model for 
agent communication. 

A brute force prototype implementation for LOLPs is available which uses an ex- 
isting answer set solver to generate all extended answer sets, and then filters out the 
preferred ones taking into account the given preference levels. A dedicated implementa- 
tion, using existing answer set solvers, could, similarly to [6], compute preferred answer 
sets more directly by generating one extended answer set and then trying to generate a 
better one using an augmented program, which, when applied in a fixpoint computation, 
results in a preferred answer set. 



On Programs with Linearly Ordered Multiple Preferences 193 



References 

1 . Jose Julio Alferes and Luts Moniz Pereira. Updates plus preferences. In Ojeda- Aciego et al. 
[23], pages 345-360. 

2. Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. Specifying and querying database 
repairs using logic programs with exceptions. In Proceedings of the 4th International Confer- 
ence on Flexible Query Answering Systems, pages 27-41, Warsaw, Octobre 2000. Springer- 
Verlag. 

3. Chitta Baral. Knowledge Representation, Reasoning and Declarative Problem Solving. Cam- 
bridge Press, 2003. 

4. G. Brewka. Logic programming with ordered disjunction. In Proceedings of the 18th Na- 
tional Conference on Artificial Intelligence and Fourteenth Conference on Innovative Ap- 
plications of Artificial Intelligence, pages 100-105, Edmonton, Canada, July 2002. AAAI 
Press. 

5. Gerhard Brewka and Thomas Eiter. Preferred answer sets for extended logic programs. Arti- 
ficial Intelligence, 1 09( 1 -2):297— 356, April 1999. 

6. Gerhard Brewka, Ilkka Niemela, and Tommi Syrjanen. Implementing ordered disjunction 
using answer set solvers for normal programs. In Flesca et al. [17], pages 444^455. 

7. Francesco Buccafurri, Wolfgang Faber, and Nicola Leone. Disjunctive logic programs with 
inheritance. In Danny De Schreye, editor, Logic Programming: The 1999 International Con- 
ference, pages 79-93, Las Cruces. New Mexico, December 1999. MIT Press. 

8. Francesco Buccafurri, Nicola Leone, and Pasquale Rullo. Strong and weak constraints in dis- 
junctive datalog. In Proceedings of the 4th International Conference on Logic Programming 
(LPNMR ’97), pages 2-17, 1997. 

9. Francesco Buccafurri, Nicola Leone, and Pasquale Rullo. Disjunctive ordered logic: Seman- 
tics and expressiveness. In Anthony G. Cohn, Lenhard K. Schubert, and Stuart C. Shapiro, 
editors, Proceedings of the 6th International Conference on Principles of Knowledge Repre- 
sentation and Reasoning, pages 4 1 8 — 43 1 , Trento, June 1998. Morgan Kaufmann. 

10. Francesco Buccafurri, Nicola Leone, and Pasquale Rullo. Enhancing disjunctive datalog by 
constraints. Knowledge and Data Engineering, 12(5): 845-860, 2000. 

11. Marina De Vos and Dirk Vermeir. Choice Logic Programs and Nash Equilibria in Strate- 
gic Games. In Jorg Flum and Mario Rodrfguez-Artalejo, editors. Computer Science Logic 
(CSL’99), volume 1683 of Lecture Notes in Computer Science, pages 266-276, Madrid, 
Spain, 1999. Springer Verslag. 

12. Thomas Eiter, Wolfgang Faber, Nicola Leone, and Gerald Pfeifer. The diagnosis frontend of 
the dlv system. Al Communications, 12(l-2):99-l 1 1, 1999. 

13. Thomas Eiter, Wolfgang Faber, Nicola Leone, Gerald Pfeifer, and Axel Polleres. Planning 
under incomplete knowledge. In John W. Lloyd, Veronica Dahl, Ulrich Furbach, Manfred 
Kerber, Kung-Kiu Lau, Catuscia Palamidessi, Luis Moniz Pereira, Yehoshua Sagiv, and 
Peter J. Stuckey, editors, Proceedings of the First International Conference on Computa- 
tional Logic (CL2000), volume 1861 of Lecture Notes in Computer Science, pages 807-821. 
Springer, 2000. 

14. Thomas Eiter, Wolfgang Faber, Nicola Leone, Gerald Pfeifer, and Axel Polleres. The DLV fc 
planning system. In Flesca et al. [17], pages 541-544. 

15. Thomas Eiter, Michael Fink, Giuliana Sabbatini, and Hans Tompits. Considerations on up- 
dates of logic programs. In Ojeda-Aciego et al. [23], pages 2-20. 

16. Wolfgang Faber, Nicola Leone, and Gerald Pfeifer. Representing school timetabling in a 
disjunctive logic programming language. In Proceedings of the 13th Workshop on Logic 
Programming (WLP ’98), 1998. 



194 Davy Van Nieuwenborgh, Stijn Heymans, and Dirk Vermeir 



17. Sergio Flesca, Sergio Greco, Nicola Leone, and Giovambattista Ianni, editors. Logic in Ar- 
tificial Intelligence, volume 2424 of Lecture Notes in Artificial Intelligence, Cosenza, Italy, 
September 2002. Springer Verlag. 

18. D. Gabbay, E. Laenens, and D. Vermeir. Credulous vs. Sceptical Semantics for Ordered 
Logic Programs. In J. Allen, R. Fikes, and E. Sandewall, editors, Proceedings of the 2nd 
International Conference on Principles of Knowledge Representation and Reasoning, pages 
208-217, Cambridge, Mass, 1991. Morgan Kaufmann. 

19. Michael Gelfond and Vladimir Lifschitz. The stable model semantics for logic programming. 
In Robert A. Kowalski and Kenneth A. Bowen, editors. Logic Programming, Proceedings of 
the Fifth International Conference and Symposium, pages 1070-1080, Seattle, Washington, 
August 1988. The MIT Press. 

20. Robert A. Kowalski and Fariba Sadri. Logic programs with exceptions. In David H. D. War- 
ren and Peter Szeredi, editors, Proceedings of the 7th International Conference on Logic 
Programming, pages 598-613, Jerusalem, 1990. The MIT Press. 

21. Els Laenens and Dirk Vermeir. A logical basis for object oriented programming. In Jan van 
Eijck, editor, European Workshop, JELIA 90, volume 478 of Lecture Notes in Artificial In- 
telligence, pages 317-332, Amsterdam, The Netherlands, September 1990. Springer Verlag. 

22. Vladimir Lifschitz. Answer set programming and plan generation. Journal of Artificial Intel- 
ligence, 138(1-21:39-54, 2002. 

23. Manual Ojeda-Aciego, Inma P. de Guzman, Gerhard Brewka, and Lutz Moniz Pereira, edi- 
tors. Logic in Artificial Intelligence, volume 1919 of Lecture Notes in Artificial Intelligence, 
Malaga, Spain, September-October 2000. Springer Verlag. 

24. Christos H. Papadimitriou. Computational Complexity. Addison Wesley, 1994. 

25. Raymond Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32(1):57 — 
95, 1987. 

26. Chiaki Sakama and Katsumi Inoue. Representing priorities in logic programs. In Michael J. 
Maher, editor, Proceedings of the 1996 Joint International Conference and Symposium on 
Logic Programming, pages 82-96, Bonn, September 1996. MIT Press. 

27. T. Soininen and I. Niemela. Developing a declarative rule language for applications in prod- 
uct configuration. In Proceedings of the First International Workshop on Practical Aspects 
of Declarative Languages (PADL ’99), Lecture Notes in Computer Science, San Antonio, 
Texas, 1999. Springer Verslag. 

28. T. Soininen, I. Niemela, J. Tiihonen, and R. Sulonen. Representing configuration knowledge 
with weight constraint rules. In Proceedings of the AAAI Spring 2001 Symposium on Answer 
Set Programming: Towards Efficient and Scalable Knowledge, Stanford, USA, 2001. 

29. Davy Van Nieuwenborgh and Dirk Vermeir. Preferred answer sets for ordered logic pro- 
grams. In Flesca et al. [17], pages 432-443. 

30. Davy Van Nieuwenborgh and Dirk Vermeir. Order and negation as failure. In Catuscia 
Palamidessi, editor, Proceedings of the International Conference on Logic Programming, 
volume 2916 of Lecture Notes in Computer Science, pages 194-208. Springer, 2003. 

31. Davy Van Nieuwenborgh and Dirk Vermeir. Ordered diagnosis. In Proceedings of the 10th 
International Conference on Logic for Programming, Artificial Intelligence, and Reason- 
ing (LPAR2003), volume 2850 of Lecture Notes in Artificial Intelligence, pages 244-258. 
Springer Verlag, 2003. 

32. Kewen Wang, Lizhu Zhou, and Fangzhen Lin. Alternating fixpoint theory for logic pro- 
grams with priority. In Proceedings of the International Conference on Computational Logic 
(CL2000), volume 1861 of Lecture Notes in Computer Science, pages 164-178. Springer, 
2000 . 



Splitting an Operator 

An Algebraic Modularity Result 
and Its Application to Logic Programming 



Joost Vennekens, David Gilis, and Marc Denecker 



Department of Computer Science, K.U. Leuven 
Celestijnenlaan 200A 
B-3001 Leuven, Belgium 



Abstract. It is well known that, under certain conditions, it is possi- 
ble to split logic programs under stable model semantics, i.e. to divide 
such a program into a number of different “levels” , such that the models 
of the entire program can be constructed by incrementally constructing 
models for each level. Similar results exist for other non-monotonic for- 
malisms, such as auto-epistemic logic and default logic. In this work, we 
present a general, algebraic splitting theory for programs/theories un- 
der a fixpoint semantics. Together with the framework of approximation 
theory, a general fixpoint theory for arbitrary operators, this gives us a 
uniform and powerful way of deriving splitting results for each logic with 
a fixpoint semantics. We demonstrate the usefulness of these results, by 
generalizing Lifschitz and Turner’s splitting theorem to other semantics 
for (non-disjunctive) logic programs. 



1 Introduction 

An important aspect of human reasoning is that it is often incremental in nature. 
When dealing with a complex domain, we tend to initially restrict ourselves to 
a small subset of all relevant concepts. Once these “basic” concepts have been 
figured out, we then build another, more “advanced”, layer of concepts on this 
knowledge. A quite illustrative example of this can be found in most textbooks 
on computer networking. These typically present a seven-layered model of the 
way in which computers communicate. First, in the so-called physical layer, 
there is only talk of hardware and concepts such as wires, cables and electronic 
pulses. Once these low-level issues have been dealt with, the resulting knowledge 
becomes a fixed base, upon which a new layer, the data-link layer, is built. 
This no longer considers wires and cables and so on, but rather talks about 
packages of information travelling from one computer to another. Once again, 
after the workings of this layer have been figured out, this information is taken 
“for granted” and becomes part of the foundation upon which a new layer is 
built. This process continues all the way up to a seventh layer, the application 
layer, and together all of these layers describe the operation of the entire system. 

In this paper, we investigate a formal equivalent of this method. More specifi- 
cally, we address the question of whether a formal theory in some non-monotonic 
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language can be split into a number of different levels or strata, such that the 
formal semantics of the entire theory can be constructed by succesively con- 
structing the semantics of the various strata. (We use the terms “stratification” 
and “splitting” interchangeably to denote a division into a number of different 
levels. This is a more general use of both these terms, than in literature such as 
[Gel87].) Such stratifications are interesting from both a practical and a more 
theoretical, knowledge representational point of view. For instance, computing 
models of a stratified version of a theory is often significantly faster than comput- 
ing models of the original theory. Furthermore, in order to be able to build and 
maintain large knowledge bases, it is crucial to know which parts of a theory can 
be analysed or constructed independently and, conversely, whether combining 
several correct theories will have any unexpected side-effects. 

It is therefore not surprising that this issue has already been intensively 
studied. Indeed, splitting results have been proven for auto-epistemic logic under 
the semantics of expansions [GP92,NR94] default logic under the semantics of 
extensions [Tur96] and various kinds of logic programs under the stable model 
semantics [LT94,EL04]. In all of these works, stratification is seen as a syntactical 
property of a theory in a certain language under a certain formal semantics. 

In this work, we take a different approach to studying this topic. The se- 
mantics of several (non-monotonic) logics can be expressed through fixpoint 
characterizations in some lattice of semantic structures. In such a semantics, the 
meaning of a theory is described by an operator, which revises proposed “states 
of affairs”. The models of a theory are those states which no longer have to be 
revised. Knowing such a revision operator for a theory, should suffice to know 
whether it is stratifiable: this will be the case if no higher levels are ever used 
to revise the state of affairs for lower-level concepts. This motivates us to study 
the stratification of these revision operators themselves. As such, we are able 
to develop a general theory of stratification at an abstract, algebraic level and 
apply its results to each formalism which has a fixpoint semantics. 

This approach is especially powerful when combined with the framework of 
approximation theory, a general fixpoint theory for arbitrary operators, which 
has already proved highly useful in the study of non-monotonic reasoning. It 
naturally captures, for instance, (most of) the common semantics of logic pro- 
gramming [DMTOO], auto-epistemic logic [DMT03] and default logic [DMT03]. 
As such, studying stratification within this framework, allows our abstract re- 
sults to be directly and easily applicable to logic programming, auto-epistemic 
logic and default logic. 

Studying stratification at this more semantical level has three distinct ad- 
vantages. First of all, it avoids duplication of effort, as the same algebraic theory 
takes care of stratification in logic programming, auto-epistemic logic, default 
logic and indeed any logic with a fixpoint semantics. Secondly, our results can 
be used to easily extend existing results to other (fixpoint) semantics of the 
aforementioned languages. Finally, our work also offers greater insight into the 
general principles underlying various known stratification results, as we are able 
to study this issue in itself, free of being restricted to a particular syntax or 
semantics. 
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This papers is structured in the following way. In section 2, some basic notions 
from lattice theory are introduced and a brief introduction to the main concepts 
of approximation theory is given. Section 3 is the main part of this work, in 
which we present our algebraic theory of stratifiable operators. In section 4, we 
then show how these results can be applied to logic programming. We would like 
stress that, although space restrictions prevent us from demonstrating this here, 
a similar treatment exists for both auto-epistemic logic and default logic. 

2 Preliminaries 

2.1 Orders and Lattices 

A binary relation < on a set S' is a partial order if it is reflexive, transitive and 
anti-symmetric. An element x £ S is a central element if it is comparable to 
each other element of S, i.e. x < y or x > y for each y G S. For each subset R 
of S, an element l of S, such that l < r for all r € R is a lower bound of R. An 
element g in S such that g is a lower bound of R and for each other lower bound 
l of R, l < g, is called the greatest lower bound , denoted glb(R), of R. Similarly, 
an element u such that for each r € R, u > r is an upper bound of R and if one 
such upper bound is less or equal to each other upper bound of I?, it is the least 
upper bound lub(R) of R. 

A partial order < on a set S is well-founded if each subset R of S has a 
minimal element; it is total if every two elements x,y € S are comparable, i.e. 
x < y or x > y. 

A pair ( L , <) is a lattice if < is a partial order on a non-empty set L, such 
that each two elements x, y of L have a greatest lower bound glb{ x, y) and a least 
upper bound lub(x,y). A lattice (L,<) is complete if each subset L' of L has a 
greatest lower bound glb(L') and least upper bound lub(L'). By definition, such 
a lattice has a minimal (or bottom ) element _L and a maximal (or top ) element 
T. Often, we will not explicitely mention the partial order < of a lattice (L, <) 
and simply speak of the lattice L. 

An operator O is a function from a lattice to itself. An operator on a lattice 
L is monotone if for each x,y € L , such that x < y, 0(x ) < 0(y). An element 
x in L is a fixpoint of O if O(x) = x. We denote the set of all fixpoint of O by 
fp(O). A fixpoint x of L, such that for each other fixpoint y of L, x < y, is the 
least fixpoint Ifp(O) of O. It can be shown [Tar55] that each monotone operator 
has such a unique least fixpoint. 



2.2 Approximation Theory 

Approximation theory is a general fixpoint theory for arbitrary operators, which 
generalizes ideas found in, among others, [BS91] and [Fit89] . Our presentation 
of this theory is based on [DMTOO]. 

Let (L, <) be a lattice. An element (x, y) of the square L 2 of the domain of 
such a lattice, can be seen as denoting an interval [x, y\ = {z € L \ x < z < y}. 
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Using this intuition, we can derive a precision order < p on the set L 2 from the 
order < on L: for each x,y,x',y' £ L,(x,y ) < p ( x',y ') iff x < x' and y' < y. 
Indeed, if (x,y) < p (x',y'), then [x,y] D [x',y']. It can easily be shown that 
(L 2 ,< p ) is also a lattice, which we will call the bilattice corresponding to L. 
Moreover, if L is complete, then so is L 2 . As an interval [x, a:] contains precisely 
one element, namely x itself, elements ( x , x) of L 2 are called exact. The set of all 
exact elements of L 2 forms a natural embedding of L in L 2 . A pair ( x,y ) only 
corresponds to a non-empty interval if x < y. Such pairs are called consistent. 

Approximation theory is based on the study of operators on bilattices L 2 
which are monotone w.r.t. the precision order < p . Such operators are called 
approximations. For an approximation A and x,y £ L, we denote by A 1 {x,y) 
and A 2 (x,y) the unique elements of L, for which A(x,y) = (A 1 (x, y), A 2 (x,y)). 
An approximation approximates an operator O on L if for each x £ L, A{x , x) 
contains 0{x), i.e. A 1 (a:,a;) < O(x) < A 2 (x,x). An exact approximation is one 
which maps exact elements to exact elements, i.e. A 1 { x, x) = A 2 ( x, x) for all x £ 
L. Similarly, a consistent approximation maps consistent elements to consistent 
elements, i.e. if x < y then A x {x,y) < A 2 (x,y). An inconsistent approximation 
cannot approximate any operator. Each exact approximation is consistent and 
approximates a unique operator O on L, namely that which maps each x £ L to 
A 1 ( x, x). An approximation is symmetric if for each pair (x, y) £ L 2 , if A(x, y) = 
(. x',y ') then A(y,x) = (y',x'). Each symmetric approximation is also exact. 

For an approximation A on L 2 , the following two operators on L can be 
defined: the function A 1 (-,y) maps an element x £ L to A 1 (x,y), i.e. A 1 (-,y) = 
A x.A 1 (x, y), and the function A 2 (x,-) maps an element y £ L to A 2 (x,y) 1 i.e. 
A 2 (x, •) = A y.A 2 (x, y). As all such operators are monotone, they all have a 
unique least hxpoint. We define an operator C^ A on L, which maps each y £ 
L to lfp(A 1 (- 1 y)) and, similarly, an operator C\, which maps each x £ L to 
lfp(A 2 (x, •)). C A is called the lower stable operator of A, while C A is the upper 
stable operator of A. Both these operators are anti-monotone. Combining these 
two operators, the operator Ca on L 2 maps each pair (x,y) to (C A (y),C\(x)). 
This operator is called the partial stable operator of A. Because the lower and 
upper partial stable operators C A and C\ are anti- monotone, the partial stable 
operator Ca is monotone. Note that if an approximation A is symmetric, its 
lower and upper partial stable operators will always be equal, i.e. C A = C\. 

An approximation defines a number of different hxpoints: its least hxpoint is 
called its Kripke-Kleene fixpoint, hxpoints of its partial stable operator Ca are 
stable fixpoints and the least hxpoint of Ca is called the well-founded fixpoint of 
A. As shown in [DMTOO] and [DMT03], these hxpoints correspond to various 
semantics of logic programming, auto-epistemic logic and default logic. 

Finally, it should be noted that the concept of an approximation as defined 
in [DMTOO] corresponds to our dehnition of a symmetric approximation. 

3 Stratification of Operators 

In this section, we develop a theory of stratihable operators. We will, in section 
3.2, investigate operators on a special kind of lattice, namely product lattices, 
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which will be introduced in section 3.1. In section 3.3, we then return to approx- 
imation theory and discuss stratifiable approximations on product lattices. 



3.1 Product Lattices 

We begin by defining the notion of a product set , which is a generalization of the 
well-known concept of cartesian products. 

Definition 3.1.1. Let I be a set, which we will call the index set of the product- 
set, and for each i € I , let Si be a set. The product set 0) ig/ Si is the following 
set of functions: ® ig/ Si = {/ | / : I — > (J igj suc h ^hat Vi £ I : f(i) £ Si} 

Intuitively, a product set ® ig/ Si contains all ways of selecting one element 
from each set 5*. As such, if / is a set with n elements, e.g. the set {1, ... , n}, the 
product set 0 ig r Si is simply (isomorphic to) the cartesian product Si x • • • x S n . 

Definition 3.1.2. Let I be a set and for each i £ I, let (Si, <f) be a partially 
ordered set. The product order <0 on the set 0j e jS) is defined as: \/x,y £ 
<8> ie i Si : x <0 y iff \/i € I : x(i) <* y(i). 

It can easily be shown that if all of the partially ordered sets 5) are (complete) 
lattices, the product set 0ie/<Si, together with its product order <0, is also a 
(complete) lattice. We therefore refer to the pair (0ig/Si,<0) as the product 
lattice of lattices <S). 

From now on, we will only consider product lattices with a well-founded index 
set, i.e. index sets I with a partial order A such that each non-empty subset of 
I has a ^-minimal element. This will allow us to use inductive arguments in 
dealing with elements of product lattices. Most of our results, however, also hold 
for index sets with an arbitrary partial order; if a certain proof depends on the 
well-foundedness of I, we will always explicitely mention this. 

In the next sections, the following notations will be used. For a function 
/ : A —> B and a subset A! of A, we denote by f\ a' the restriction of / to A', 
i.e. f\A’ : A' — > B : a' 1— > f(a'). For an element £ of a product lattice 0* g /Li and 
an « £ /, we abbreviate x\{je ruy-a by x \ -<»■ We also use similar abbreviations 
x \ _<*, x\i and If i is a minimal element of the well-founded set /, x|_<j is 
defined as the empty function. For each index i, the set { x \ | x £ L}, ordered by 
the appropriate restriction <0^ of the product order, is also a lattice. Clearly, 
this sublattice of L is isomorphic to the product lattice 0 J x i L i . We denote this 
sublattice by L\-<i and use a similar notation L|_<i for 0^1/, . 

If /, g are functions / : A — > B, g : C — > D and the domains A and C are 
disjoint, we denote by / U g the function from A U C to B U D, such that for all 
a £ A, (fUg)(a) = f(a) and for all c £ C, (fUg)(c) = g(c). Furthermore, for any 
g whose domain is disjoint from the domain of /, we call / U g an extension of 
/. For each element x of a product lattice L and each index i £ I, the extension 
x\^i U x\i of x|xi is clearly equal to x| -<*. For ease of notation, we sometimes 
simply write x(i) instead of x\i in such expressions, i.e. we identify an element a 
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of the ith lattice Li with the function from {*} to Li which maps i to a. Similarly, 
x \ xi U x(i) U x\-£i = x. 

We will use the symbols x, y to denote elements of an entire product lattice 
L; a , b to denote elements of a single level L, and u, v to denote elements of L | X j. 

3.2 Operators on Product Lattices 

Let (I, be a well-founded index set and let L = ®j g /L.j be the product lattice 
of lattices (Li, <i)iei- Intuitively, an operator O on L is stratifiable over the order 
if the value (0(x))(*) of 0(x) in the it h stratum only depends on values x(j) 
for which j V i. This is formalized in the following definition. 

Definition 3.2.1. ^4n operator O on a product lattice L is stratifiable iffVx, y £ 
L,Mi £ I : if x\-<i = y\ -<» then 0(x)\-<i = 0(y)|-<j. 

It is possible to characterize stratifiablity in a more constructive manner. The 
following theorem shows that stratifiablity of an operator O on a product lattice 
L is equivalent to the existence of a family of operators on each lattice Lj (one 
for each u £ L |_<j), which mimics the behaviour of O on this lattice. 

Proposition 3.2.1. Let O be an operator on a product lattice L. O is stratifiable 
iff for each i £ I and u £ L\^i there exists a unique operator Of on Li, such 
that for all x £ L: If x\^i = u then (0(x))(i) = Of(x(i)). 

Proof. To prove the implication from left to right, let O be a stratifiable operator, 
i £ I and u £ L\^. We define the operator Of on Li as Of : Lj — > Lj : a 
(0(y))(i), with y some element of L extending itUa. Because of the stratifiability 
of O , this operator is well-defined and it trivially satisfies the required condition. 

To prove the other direction, suppose the right-hand side of the equivalence 
holds and let x, x’ be elements of L, such that x\ -<i = x'\ -<*. Then for each j < i, 
(0(x))(j) = 0f^(x(j)) = Oj' ' J (.'■'!,/ ) ) = (0(x r ))(j). 

The operators Of are called the components of O. Their existence allows 
us to already prove one of the main theorems of this paper, which states that 
is possible to construct the fixpoints of a stratifiable operator in a bottom-up 
manner w.r.t. the well-founded order on the index set. 

Theorem 3.2.1. Let O be a stratifiable operator on a product lattice L. Then 
for each x £ L: x is a fixpoint of O iff Vi £ I : x(i) is a fixpoint of 

Proof. Follows immedately from proposition 3.2.1. 

If O is a monotone operator on a complete lattice, we are often interested in 
its least fixpoint. This can also be constructed by means of the least fixpoints 
of the components of O. Such a construction of course requires each component 
to actually have a least fixpoint as well. We will therefore first show that the 
components of a monotone operator are also monotone. 
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Proposition 3.2.2. Let O be a stratifiable operator on a product lattice L, which 
is monotone w.r.t. the product order <®. Then for each i £ I and u £ L\^i, the 
component Of : Li — > Li is monotone w.r.t. to the order <i of the ith lattice Li 
of L. 

Proof. Let i be an index in I, u an element of L\^i and a, b elements of Li, such 
that a <i b. Let x,y £ L, such that x extends u U a, y extends u U b and for 
each j ft i, x(j) = y(J). Because of the definition of <®, clearly x <® y and 
therefore V? £ I : Of^ (x(j)) = (0(x))(j) <j (0(y))(j) = of <j {y{j)), which, 
taking j = i, implies Of (a) < , Of(b). 

Now, we can prove that the least fixpoints of the components of a monotone 
stratifiable operator indeed form the least fixpoint of the operator itself. We will 
do this, by first proving the following, slightly more general theorem, which we 
will be able to reuse later on. 

Proposition 3.2.3. Let O be a monotone operator on a complete product lattice 

L and let for each i £ I, u £ L\^i, P“ be a monotone operator on Li (not 

necessarily a component of O), such that: 

x is a fixpoint of O iff Vi £ I : x(i) is a fixpoint of p^\ 

Then the following equivalence also holds: 

x is the least fixpoint of O iff Vi £ I : x(i) is the least fixpoint of 

Proof. To prove the implication from left to right, let x be the least fixpoint of 

0 and let i be an arbitrary index in I. We will show that for each fixpoint a of 

P(^\ a > x(i). So, let a be such a fixpoint. We can inductively extend x\^iUa 
to an element y of L by defining for all j ft i, y(j) as lfp(P^ <j ). Because of 
the well-foundedness of A, y is well defined. Furthermore, y is clearly also a 

fixpoint of O. Therefore x < y and, by definition of the product order on L, 

x{i) <i y{i) = a. 

To prove the other direction, let x be an element of L , such that, for each 

1 £ I, x(i) is the least fixpoint of Pf^\ Now, let y be the least fixpoint of O. 
To prove that x = y, it suffices to show that for each i £ I, x\ -<j = y|^j. We will 
prove this by by induction on the well-founded order ^ of I. If i is a minimal 
element of I, the proposition trivially holds. Now, let i be an index which is not 
the minimal element of I and assume that for each j -< i, x\ = y\-<j. It suffices 
to show that x(i) = y(i). Because y is a fixpoint of (). y(i) is fixpoint of P^*. 
As the induction hypothesis implies that x\ = y |_<j, y(i) is a also fixpoint of 
P(\^' and therefore x(i) < y(i ). However, because x is also a fixpoint of O and 
therefore must be greater than the least fixpoint y of O, the definition of the 
product order on L implies that x(i) > y(i) as well. Therefore x(i) = y(i ). 

It is worth noting that the condition that the order A on / should be well- 
founded is necessary for this proposition to hold. Indeed, consider for example the 
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product lattice L = with ^ the integers ordered by their usual, non- 

well-founded order. Let O be the operator mapping each x £ L to y : Z — +{0,1} 
of L , which maps each z £ Z to 0 if x(z — 1) = 0 and to 1 otherwize. This 
operator is stratifiable over the order < of Z and its components are the family 
of operators Of, with z £ Z and u £ L\< z , which are defined as mapping both 0 
and 1 to 0 if u(z — 1) = 0 and to 1 otherwize. Clearly, the bottom element _l_£ of 
L, which maps each z £ Z to 0, is the least fixpoint of O. However, the element 
x £ L which maps each z £ Z to 1 satisfies the condition that for each 2 £ Z, 
x(z) is the least fixpoint of P ^ <z , but is not the least fixpoint of O. 

Together with theorem 3.2.1 and proposition 3.2.2, this proposition implies 
that for each stratifiable operator O on a product lattice L, an element x £ L is 
the least fixpoint of O iff Vi £ /, x{i) is the least fixpoint of O^' . In other words, 
the least fixpoint of a stratifiable operator can also be incrementally constructed. 

3.3 Approximations on Product Lattices 

In section 2.2, we introduced several concepts from approximation theory, point- 
ing out that we are mainly interested in studying Kripke-Kleene, stable and 
well-founded fixpoints of approximations. Similar to our treatment of general 
operators in the previous section, we will in this section investigate the rela- 
tion between these various fixpoints of an approximation and its components. 
In doing so, it will be convenient to switch to an alternative representation of 
the bilattice L 2 of a product lattice L = ®i £ iLi. Indeed, this bilattice is clearly 
isomorphic to the structure i.e. to a product lattice of bilattices. From 

now on, we will not distinguish between these two representations. More specifi- 
cally, when viewing A as a stratifiable operator, it will be convenient to consider 
its domain equal to while when viewing A as an approximation, the 

representation (0,g/Li) 2 is more natural. 

Obviously, this isomorphism and the results of the previous section already 
provide a way of constructing the Kripke-Kleene fixpoint of a stratifiable ap- 
proximation A , by means of its components Af. Also, it is clear that if A is both 
exact and stratifiable, the unique operator O approximated by A is stratifiable as 
well. Indeed, this is a trivial consequence of the fact that A(x, x) = (0(x),0(x)) 
for each x £ L. 

These results leave only the stable and well-founded fixpoints of A to be 
investigated. We will first examine the operators A 1 (-, y) and A 2 ( x, •), and then 
move on to the lower and upper stable operators C A and C \ , before finally 
getting to the partial stable operator Ca itself. 

Proposition 3.3.1. Let L be a product lattice and let A : L 2 —> L 2 be a strat- 
ifiable approximation. Then , for each x,y £ L, the operators A 1 (-, y), A 2 (x, •) 
are also stratifiable. Moreover, for each i £ I, u £ L | xi , the components of these 
operators are: 

(A 1 (.,y))“=(A^ l ^) 1 (.,y(i)), 

(A 2 (x,-))f ={A^ u) f{x{i),-), 
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Proof. Let x, y be elements of L, i an element of I. Then, because A is stratifiable, 
( A(x,y))(i ) = (A[ x,v ^^')(x(i) , y(i)) . From this, the two equalities follow. 

In the previous section, we showed that the components of a monotone op- 
erator are monotone as well (proposition 3.2.2). This result implies that the 
components Af of a stratifiable approximation are also approximations. There- 
fore, such a component Af also has a lower and uppers stable operator C A u and 
C\u . It turns out that the lower and upper stable operators of the components 
of A, characterize the components of the lower and upper stable operators of A. 

Proposition 3.3.2. Let L be a product lattice and let Abe a stratifiable approx- 
imation on L 2 . Then the operators C A and C\ are also stratifiable. Moreover, 
for each x, y £ L, 



II 

S- 


iff 


for each i £ I, x(i) 


= C'i(-.y). .MV)', 


y = c\{x) 


iff 


for each i £ I, y(i ) 


= (*(*))• 



Proof. Let x,y be elements of L. Because A 1 (-,y) is stratifiable (proposition 
3.3.1), the corollary to proposition 3.2.3 implies that x = C A {y) = lfp(A 1 (-,y)) 
iff for each i £ I, x(i) = IfpffA 1 ^, y))^ *). Because of proposition 3.3.1, 
this is in turn equivalent to for each i £ I, x(i) = Z/p((^ a: ’ 3, ^' <i ) 1 (-, y(i))) = 
C ^( XiV ) |_ <4 ( j /( i )). The proof of the second equivalence is similar. 

This proposition shows how, for each x,y £ L, C A (y ) and C\ (x) can be 
be constructed incrementally from the upper and lower stable operators corre- 
sponding to the components of A. This result also implies a similar property for 
the partial stable operator Ca of an approximation A. 

Proposition 3.3.3. Let L be a product lattice and let A : L 2 — > L 2 be a stratifi- 
able approximation. Then the operator Ca is also stratifiable. Moreover, for each 
x,x',y,y’ £ L, the following equivalence holds: 

\ *' =C f ^./, y)Ul (l/(i)), 

{x',y') =C A {x,y) iff Mi £ I : l ‘ 

[y (*(*))■ 

Proof. Follows immediately from proposition 3.3.2. 

It should be noted that the components (Ca)^’ 1 ' 1 of the partial stable opera- 
tor of a stratifiable approximation A are not equal to the partial stable operators 
C A ( U ,v ) of the components of A. Indeed, (Ca)^ 1 ’^ = ((Cyi)i> (C A )f)), whereas 

C = (C^ (u , C' T (u „,). Clearly, these two pairs are, in general, not equal, as 

{C A )1 ignores the argument u, which does appear in C^ (uv) . We can, however, 

characterize the fixpoints of C A , i.e. the partial stable fixpoints of A , by means 
of the partial stable fixpoints of the components of A. 
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Theorem 3.3.1. Let L be a product lattice and let A : L 2 — > L 2 be a stratifiable 
approximation. Then for each element (x,y) of L 2 : 

(x,y) is a fixpoint of Ca iff Vi G I : ( x,y)(i ) is a fixpoint ofC A ( X , y ) |_, 4 . 

Proof. Let x,y be elements of L, such that (x,y) = CA(x,y). By proposition 
3.3.3, this is equivalent to Vi G I, x = C l ^ (y(i)) and y = C r (x>v)l ^(x(i)). 

By proposition 3.2.3, this theorem has the following corollary: 

Corollary 1. Let L be a product lattice and let A : L 2 — > L 2 be a stratifiable 
approximation. Then for each element (x,y) of L 2 : ( x,y ) = lfp(C.A) iff Vi G I : 
(x,y)(i) = lfp(C A ( X ,y) i^). 

Putting all of this together, the main results of this section can be summa- 
rized as follows. If A is a stratifiable approximation on a product lattice L, then 
a pair (a:, y) is a fixpoint, Kripke-Kleene fixpoint, stable fixpoint or well-founded 
fixpoint of A iff for each i £ J, (x(i), y(i)) is a fixpoint, Kripke-Kleene fixpoint, 
stable fixpoint or well-founded fixpoint of the component a\ x ’ v ^^ z of A. More- 
over, if A is exact then an element x € L is a fixpoint of the unique operator 
O approximated by A iff for each i £ I, (x(i),x(i)) is a fixpoint of the com- 
ponent A[ x,x ^^ t of A. These characterizations give us a way of incrementally 
constructing each of these fixpoints. 

4 Application to Logic Programming 

The general, algebraic framework of stratifiable operators developed in the pre- 
vious section, allows us to easily and uniformly prove splitting theorems for all 
fixpoint semantics of non-monotonic reasoning formalisms. We will demonstrate 
this by applying the previous results to logic programming. 

4.1 Syntax and Semantics 

For simplicity, we will deal with propositional logic programs. Let £ be an 
alphabet, i.e. a collection of symbols which are called atoms. A literal is either 
an atom p or the negation ~^q of an atom q. A logic program is a set of clauses 
the form h b \, . . . , b n . Here, h is a atom and all bi are literals. For such a 
clause r, we denote h by head(r ) and the set {/q, . . . , b n } by body(r). 

Logic programs can be interpreted in the lattice (2 s , C), i.e. the powerset of 
£ . This set of interpretations of £ is denoted by Is- Following the framework 
of approximation theory, we will, however, interpret programs in the bilattice 
Be = if. In keeping with the intuitions presented in section 2.2, for such a pair 
( X , Y), the interpretation X can be seen as representing an underestimate of the 
set of true atoms, while Y represents an overestimate. Or, to put it another way, 
X contains all atoms which are certainly true, while Y contains atoms which are 
possibly true. These intuitions lead naturally to the following definition of the 
truth value of a propositional formula. 
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Definition 4.1.1. Let p,f> be propositional formula in an alphabet £, a an 
atom of £ and let (X, Y ) € B x- We define 

~ H(x,y)(o) = t iff a G X; 

- H( X ,y){v A ip) = t iff H( X , Y)(<P) = t and H( X ,Y)(tp) = t; 

- H(x,y){v V if) = t iff H { x,y)(v) = t or H( X ,Y)(ip ) = t; 

- H( X 'Y)(-«p) = t iff H( Y , X )(<P) = f .' 

Note that to evaluate the negation of a formula -up in a pair (X,Y), we 
actually evaluate <p in (Y,X). Indeed, the negation of a formula will be certain 
if the formula itself is not possible and vice versa. Using this definition, we can 
now define the following operator on Bx- 

Definition 4.1.2. Let P be a logic program with an alphabet £ . The operator 
T P on Bx is defined as: T P {X,Y) = (U P (X,Y),U P (Y,X)), with U P (X,Y ) = 
{pG £ \ 3r G P : headfr) = p, H^ x ,Y)(body(r)) = t}. 

When restricted to consistent pairs of interpretation, this operator T P is 
the well known 3- valued Fitting operator [Fit85] . In [DMT00], T P is shown to 
be a symmetric approximation. Furthermore, it can be used to define most of 
the “popular” semantics for logic programs: the operator which maps an inter- 
pretation X to U P (X,X) is the well known (two-valued) Tp-operator [Llo87]; 
the partial stable operator of T P is the Gelfond-Lifschitz operator QC [VRS91]. 
Fixpoints of T P are supported models of P, the least fixpoint of T p is the Kripke- 
Kleene model of P, fixpoints of QC are (four-valued) stable models of P and its 
least fixpoint is the well-founded model of P. 

4.2 Stratification 

Our discussion of the stratification of logic programs will be based on the cle- 
pendecies between atoms, which are expressed by a logic program. These induce 
the following partial order on the alphabet of the program. 

Definition 4.2.1. Let P be a logic program with alphabet £. The dependency 
order <dep on £ is defined as: for all p,q G £: p <dep q iff 3r G P : q = 
head(r),p G body(r). 

To illustrate this definition, consider the following small program: 

{ p <— > q , ->r . ' 
q <— ->p, -i r. > 
s <-p, q. / 

The dependency order of this program can be graphically represented as: 



s 




In other words, r < dep p, r < dep q,p < dep q, q < dep p, s < dep p and s < dep q. 
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Based on this dependency order, the concept of a splitting of the alphabet of 
a logic program can be defined. 

Definition 4.2.2. Let P be a logic program with alphabet E. A splitting of P 
is a partition (Ei)i e i of E, such that the well-founded order A on I agrees with 
the dependency order <dep of P, i.e. if p <dep q, P G Ei and q £ Ej, then i A j. 

For instance, the following partition is a splitting of the program E: E 0 = {r}, 
Ei = {p, q} and E 2 = {s} (with the totally ordered set {0, 1, 2} as index set). 

If ( Ei)i£j is a partition of a logic program P with alphabet E , the product 
lattice is isomorphic to the powerset 2 s . We can therefore view the 

operator 7 p of such a program as being an operator on the bilattice of this 
product lattice, instead of on the original lattice B %. Moreover, if such a partition 
is a splitting of P, the Tp-operator on this product lattice is stratifiable. 

Theorem 4.2.1. Let P be a logic program and let be a splitting of this 

program. Then the operator Tp on the bilattice of the product lattice is 

stratifiable. 

Proof. Let Ej £ S and (X,Y),(X',Y') £ B x , s.t. X\ -<* = X'\ and Y\^ = 
It suffices to show that for each clause with an atom from Ej in its head, 
H(x,Y)(body(c)) = H( X ',Y')(body(c)). By definition 4.2.2, this is trivially so. 

By theorem 3.3.1, this theorem implies that, for a stratifiable program P, it 
is possible to stratify the operators Tp,7p and QC. In other words, it is possi- 
ble to split logic programs w.r.t. supported model, Kripke-Kleene, stable model 
and well-founded semantics. Moreover, the supported, Kripke-Kleene, stable and 
well-founded models of P can be computed from, respectively, the supported, 
Kripke-Kleene, stable and well-founded models of the components of Tp. 

In order to be able to perform this construction in practice, however, we also 
need a more precise characterization of these components. We will now show 
how to construct new logic programs from the original program, such that these 
components correspond to an operator associated with these new programs. 
First, we will define the restriction of a program to a subset of its alphabet. 

Definition 4.2.3. Let P be a logic program with a splitting For each 

i € I, the program Pj consists of all clauses with an atom from Ei in their head. 

In the case of our example, the program E is partitioned in {Eq, Ei, E 2 } with 
E 0 = {}, Pi = {p <- ~^q, ~>r. q <- ->p, — *r. } and E 2 = {s <- p, q.}. 

If P has a splitting ( Ei) ie i , then clearly such a program Pi contains, by 
definition, only atoms from (J Ej. When given a pair (U, V ) of interpretations 
of U j^jEj, we can therefore construct a program containing only atoms from 
Ei by replacing each other atom by its truth- value according to (U, V). 

Definition 4.2.4. Let P be a logic program with a splitting (Pj)j e /. For each i £ 
I and (U,V) £ B x \^i, we define Pi((U,V)) as the new logic program P' , which 
results from replacing each literal l whose atom is in U j^iEj by H (uy) {{l}). 
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Of course, one can further simplify such a program by removing all clauses 
containing f and just omitting all atoms t. Programs constructed in this way are 
now precisely those which characterize the components of the operator Tp. 

Theorem 4.2.2. Let P be a logic program with a splitting (Ui)j e /. For each 
i £ I, (U,V) £ Bz\^i and (A, B) £ B Bi : 

C Tp)1 U ' V \a,B ) = (Up imv)) (A,B),Up iaVtU)) (B,A)). 

Proof. Let i, U, V, A and B be as above. Then because the order ^ on I agrees 
with the dependency order of P , (Tp)\ U ’ V \A, B) = (A\B r ), with 

A' ={p £ Ei | 3r £ P : head(r) = p, H {UuAyuB) {body(r)) = t} 

B' ={p £ St | 3r £ P : head(r) = p, H yuByuA) {body{r)) = t} 

We will show that A' = Tp i ^u y ))(A, B)\ the proof that B' — Tp^(y y ^(B, A) 
is similar. Let r be a clause of P, such that header) £ Si. Then u a,vuB) 
( body(r )) = t iff = t for each literal l with an atom from U,^i ^Jj) and 

B( A ,b)(1') = t for each literal ' l with an atom from Because for each literal 
l with an atom from [J /At Ej), P(c/,v) (l) = t precisely iff l was replaced by t in 
Pi((U,V)), this is in turn equivalent to H^ A B ^(r((U,V))) = t (by r((U,V)) we 
denote the clause which replaces r in Pi((U, V))). 

It is worth noting that this theorem implies that a component (Tp)[ U ^ is, 
in contrast to the operator Tp itself, not necessarily exact. 

With this final theorem, we can now incrementally compute the various fix- 
points of the operator Tp. We will illustrate this by computing the well-founded 
model of our example E. Recall that this program is partitioned into the pro- 
grams E 0 = {}, Ei = {p *— -i < 7 , -i r. q <— ->p, r.} and P 2 = {s «— p , 9 .}. The well- 
founded model of E 0 is ({},{})• Replacing the atom r in £j by its truth- value 
according to this interpretation, yields the new program E \ (({},{})) = {p <— 
-<q. q -1 p.}. The well-founded model of this program is ({}, {p, q}). Replacing 

the atoms p and q in E 2 by their truth- value according to the pair of interpreta- 
tions ({},{p, q}), gives the new program E' 2 = p 2 (({}, {p, </})) = {}• Replacing 
these by their truth-value according to the pair of interpretations ({p, ?},{}), 
gives the new program E!f = -E’ 2 (({p 5 q}, {})) = {s}- The well-founded fixpoint 
of {Ue',Ue") is ({},{*}). Therefore, the well-founded model of the entire pro- 
gram E is ({} U{} U {},{}U {p,q} U {s}) = ({}, {p, q, s}). 

Of course, it also possible to apply these results to more complicated pro- 
grams. Consider for instance the following program in the natural numbers: 

{ euen( 0 ). 

odd(X + 1) <— even(X). > 
even{X + 1) 4 — odd(X). 

which can be seen as an abbreviation of the infitine propositional logic program: 
{even{ 0). odd(l) <— euen(l). even(l) <— odd(l). •••}. Clearly, the operator 
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TEven is stratifiable w.r.t. to the partition {{even{n),odd(n)}) n gN (using the 
standard order on the natural numbers) of the alphabet {even{n) \ n € N} U 
{ odd(n ) | n € N}. The component {TEven) o of this operator corresponds to the 
program Eveno = (euen(O).}, which has {even{ 0)} as its only fixpoint. Let n £ N 
and U n = {even{i) \ i < n, i is even} U {odd{i) \ i < n,i is odd}. Clearly, if n 
is even, the component ( TE ven )n Jn ’ U,l ' > corresponds to the program (euen(n).}, 
while if n is odd (T E „ en )^’ l ’ C/ "- ) corresponds to the program {odd{n).}. This 
proves that the supported, Kripke-Kleene, stable and well-founded models of 
Even all contain precisely those atoms even{n ) for which n is an even natural 
number and those atoms odd{n) for which n is an odd natural number. 

4.3 Related Work 

In [LT94], Lifschitz and Turner proved a splitting theorem for logic programs 
under stable model semantics. They, however, considered logic programs in an 
extended syntax, which allows disjunction in the head of clauses and two kinds 
of negation (negation-as-failure and classical negation). When considering only 
programs in the syntax described here, our results generalize their results to 
include supported model, Kripke-Kleene and well-founded semantics as well. 
While we have not done so here, the fact that the stable model semantics for such 
extended programs can also be characterized as a fixpoint semantics [LRS95], 
seems to suggest that our approach could be used to obtain similar results for 
this extended syntax as well. In future work, we plan to investigate this further. 

5 Conclusion 

Stratification is, both theoretically and practically, an important concept in 
knowledge representation. We have studied this issue at a general, algebraic 
level by investigating stratification of operators and approximations (section 3). 
This gave us a small but very useful set of theorems, which can be used to easily 
and uniformly prove splitting results for all fixpoint semantics of logic programs, 
auto-epistemic logic theories and default logic theories, thus generalizing existing 
results. In section 4, we demonstrated this for the case of logic programming. 

As such, the importance of the work presented here is threefold. Firstly, there 
are the concrete, applied results of section 4 themselves. Secondly, and more im- 
portantly, there is the general, algebraic framework for the study of stratification, 
which can be applied to every formalism with a fixpoint semantics. Finally, on 
a more abstract level, our work offers greater insight into the principles under- 
lying various existing splitting results, as we are able to “look beyond” purely 
syntactical properties of a certain formalism. 
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Abstract. Now that answer set programming has emerged as a practical tool 
for knowledge representation and declarative problem solving there has recently 
been a revival of interest in transformation rules that allow for programs to be sim- 
plified and perhaps even reduced to programs of ‘lower’ complexity. Although it 
has been known for some that there is a maximal monotonic logic, denoted by 
N 5 , with the property that its valid (equivalence preserving) inference rules pro- 
vide valid transformations of programs under answer set semantics, with few ex- 
ceptions this fact has not really been exploited in the literature. The paper studies 
some new transformation rules using Ns-inference to simplify extended disjunc- 
tive logic programs known to be strongly equivalent to programs with nested 
expressions. 



1 Introduction 

With the emergence of answer set solvers such as DLV [20], GnT [18], and smod- 
els [33], answer set programming (ASP) now provides a practical and viable environ- 
ment for tasks of knowledge representation and declarative problem solving. Applica- 
tions of this paradigm include planning and diagnosis, as exemplified in a prototype de- 
cision support system for the space shuttle [2], the management of heterogenous data in 
information systems, as performed in the INFOMIX project 1 , the representation of on- 
tologies in the semantic web allowing for default knowledge and inference, as discussed 
in [5], as well as compact and fully declarative representations of hard combinatorial 
problems such as n-Queens, Hamiltonian paths, and so on 2 . 

Following the rise of ASP as a practical tool, there has recently been a revival of in- 
terest in transformation rules that allow for a program to be simplified and perhaps even 
reduced to a program of Tower’ complexity, eg reducing a disjunctive program to a nor- 
mal program. Recent studies have included [4, 25, 31, 1 1]. Although it has been known 
since [27] that there is a maximal monotonic logic, denoted by N 5 , with the property 
that all of its valid (equivalence preserving) inference rules provide valid transforma- 
tions of programs under answer set semantics, with few exceptions this fact has not 

* Partially supported by CICyT project TIC-2003-9001-C02, URJC project PPR-2003-39 and 
WASP (IST-2001-37004). 

1 http : //s v. mat .unical . it/inf omix/ 

2 For these and other examples as well as a general introduction to ASP, see [3]. 

B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 210-224, 2004. 
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really been exploited in the literature. In this paper we explore several ways in which 
inference in N 5 can be used for program simplification and other computational pur- 
poses. The main contributions of the paper, in order of presentation, are as follows. In 
§2 we give an informal account of how the logic N 5 can be employed as tool for pro- 
gram transformations in ASP. §3 provides the logical background and summarises the 
main known results showing how N 5 provides a suitable logical foundation for ASP. In 
§4 we illustrate how N 5 can be used to check that a given transformation rule preserves 
semantic equivalence of the programs concerned and we discuss new rules for program 
simplification based on deriving literals (and their negations) from the program in N 5 . 
Continuing this theme, in §5 we show how other kinds of N 5 -deri validity may yield 
computationally useful metatheoretic properties. Some brief remarks on the complexity 
of these methods follows in §6 and in §7 we conclude with some remarks on related 
work and on future research topics. 



2 Monotonic vs. Nonmonotonic Semantic Transformation Rules 

Since 1995, (published as [27]), it has been known that the nonclassical logic of here- 
and-there with strong negation, N 5 , provides a suitable logical foundation for the study 
of programs and theories under answer set semantics. One property of N 5 is basic 
here: answer sets of logic programs correspond to simple kinds of minimal N 5 -models, 
called equilibrium models. It was at once apparent that this property could be useful in 
evaluating putative transformation rules for logic programs, in particular to check the 
property that a rule is valid under answer set semantics, ie preserves the answer sets 
of the program being transformed. In particular, any transformation of a program that 
proceeds according to a valid, given or derived, inference rule of N 5 will lead to a log- 
ically equivalent program having the same models and therefore the same (minimal) 
equilibrium models and the same answer sets. It is evident that there are two immediate 
applications for this property: it may be used to give rather simple proofs that certain 
known transformation rules are valid for answer set semantics, and it may prove useful 
in helping to find new rules that preserve the semantics. This was pointed out in [27] 
but not systematically exploited at the time. More recently this fact was used by others, 
notably Osorio et al [25], to verify the validity of certain rules such as TAUT, RED-, 
NONM1N and others considered by Brass and Dix [4], It is interesting to note that while 
this shows that answer sets are preserved under any transformations valid in intuition- 
istic logic, besides some stronger ones, the same is not true of the weaker well-founded 
semantics, WFS. There are intuitionistically valid transformations that do not preserve 
the well-founded semantics of a program 3 . 

In fact it is easy to see that transformations of programs that are valid in N 5 , ie that 
take a program 77 to an N 5 -equivalent program 77', have a still stronger property. Not 
only are the programs equivalent under answer set semantics, they must be equivalent 
for all possible extensions 77 U 27, 77' U 27, for the obvious reason that these extended 

3 Consider the program 77 consisting of two rules (written as logical formulas) ->a —* a; -ia — > 

b. Since in intuitionistic logic, ->a — > a I 1 a — > b, the second rule can be eliminated without 

loss. However the WFS of the resulting program ->a — > a is different from that of 77. 
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programs are also logically equivalent. This, stronger form of equivalence of programs 
has subsequently been studied under the rubric strong equivalence ([22]). 

There has been considerable study in the literature of rules respecting ordinary 
equivalence under answer sets or WFS. Generally speaking, in the case of answer sets 
we can distinguish two kinds of rules: those that are valid monotonically (ie. corre- 
spond to valid inferences in N5), and those that we might term nonmonotonic. This 
terminology is justified by the following observation. Any transformation valid in N 5 
is certainly monotonic and preserves answer sets, while there are transformations valid 
in classical logic which do not; more precisely it can be shown that there is no proper 
monotonic strengthening 0/N5 with the property that all of its valid rules will preserve 
answer sets 4 . Therefore if a rule nonvalid in N5 has the property of preserving answer 
sets, this is not say because it is a valid rule of classical logic or some other stronger 
monotonic system, but rather that it goes beyond ordinary monotonic logic. An example 
is the rule RED+ (see [4]) which states that if an atom A does not appear in the head 
of any rule of II, then any occurrence of ~A in 77 can be deleted without affecting the 
answer sets of 77. This rule is clearly not monotonic and is neither classically sound nor 
sound in N5. It relies rather on the fact that answer sets correspond to certain minimal 
models and obey the supportedness principle. 

A second basic property of the logic N5 was proven in [22]: two programs are 
strongly equivalent if and only if they are logically equivalent viewed as propositional 
theories in N5. This fact has an obvious but nonetheless interesting corollary: while 
transformation rules preserving ordinary equivalence may be monotonic or nonmono- 
tonic, as just noted, rules preserving strong equivalence must be monotonic. The ar- 
gument is immediate: any transformation that takes 77 to a program 77' that is not 
Ns-equivalent does not preserve strong equivalence 5 . 

3 Logical Background 

We work in the nonclassical logic of here-and-there with strong negation N5 and its 
nonmonotonic extension, equilibrium logic [27], which generalises answer set seman- 
tics for logic programs to arbitrary propositional theories, see eg [22]. We give an 
overview of the logic here; for more details see [27, 22, 28] and the logic texts cited. 

Formulas of N5 are built-up in the usual way using the logical constants: A, V, 
— , ~, standing respectively for conjunction, disjunction, implication, weak (or intu- 

4 While there are infinitely many logics between intuitionisdc and classical logic, there are none 
at all between here-and-there and classical. Adding strong negation does produce two non- 
trivial strengthenings of N5 but these are easily seen to be ‘stronger’ than answer set inference. 

5 Though answer set semantics was often criticised in the past for failing certain principles 
of nonmonotonic reasoning, such as cumulativity and relevance, this shows that answer set 
inference does have one very nice property from the logical point of view. Not only does 
it have a greatest deductive base logic (N5), but this logic exactly captures the equivalence 
classes of strongly equivalent programs. It is not known whether the same is true for well- 
founded semantics. Though we know by a result of Dietrich [8] that since WFS is cumulative 
it must have a greatest deductive base, it is not known what that base is and whether it captures 
strong equivalence for WFS in a similar way. 
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itionistic) negation and strong negation. The axioms and rules of inference for N5 are 
those of intuitionistic logic (see eg [6]) together with: 

1. the axiom schema (->a —>/?)—> (((/3 —> a) —> /3) —> /?), which characterises the 
3-valued here -and- there logic of Heyting [15], and Godel [13] (hence it is some- 
times known as Godel’s 3-valued logic). 

2. the following axiom schemata involving strong negation taken from the calculus of 
Vorob’ev [36, 37] (where ‘a <-> /?’ abbreviates {a — * /3) A (f3 — ■> a)): 

Nl. ~ (a — > P) a A ~/3 N2. ~(a A (3) «-> ~aV ~ /? 

N3. ~(ct V /?) <-> A ~/7 N4. ~ <-> a 

N5. ~->a a N6. (for atomic a) ~a — + ->ct 

The inference relation of N5 is denoted by K The model theory of N5 is based on 

the usual Kripke semantics for Nelson’s constructive logic N (see eg. [14,6]), but N5 
is complete for Kripke frames T = ( IT', <) (where as usual W is the set of point or 
worlds and < is a partial-ordering on W) having exactly two worlds say h (‘here’) and 
t (‘there’) with h < t. As usual a model is a frame together with an assignment i that 
associates to each element of W a set of literals 6 , such that if w < w' then i(w) C 
An assignment is then extended inductively to all formulas via the usual rules for 
conjunction, disjunction, implication and (weak) negation in intuitionistic logic, viz. 



(p Aip £ i(w) 


iff 


ip £ i(w) 


and 


tp £ i(w) 


(pV ip £ i(w) 


iff 


p £ i(w) 


or 


tp £ i(w) 


<p — > ip £ i(w) 


iff 


p £ i(w') 


implies 


tp £ i(w'),\/w' > w 


~^<p £ i(w) 


iff 


p i(w'),\/w' > w 







together with the following rules governing strongly negated formulas: 



0 A tp) 
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i(w) 


iff 
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i(w) 


or 


~ip 


£ 


i(w) 


-{pWtp) 


£ 


i(w) 


iff 


~ V 
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i(w) 


and 


~tp 


£ 


i(w) 


(p -> tp) 


£ 


i(w) 


iff 


‘P 
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i(w) 


and 


~tp 


£ 


i(w) 




£ 


i(w) 


iff 




£ 


i(w) 
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£ 


i(w) 


iff 


p 


£ 


i(w) 











It is convenient to represent an NVmodel as an ordered pair (II , T) of sets of literals, 
where H = i(h) and T = i(t) under a suitable assignment i. By h < t, it follows 
that H CT. Again, by extending i inductively we know what it means for an arbitrary 
formula <p to be true in a model ( H , T). 

A formula is true in a here-and-there model A4 = (H, T) in symbols A4 j= <p, if it 
is true at each world in Ad. A formula tp is said to be valid in N5, in symbols |= <p, if it is 
true in all here-and-there models. Logical consequence for N 5 is understood as follows: 
p is said to be an N 5 -consequence of a set II of formulas, written II \= (p, iff for all 
models M and any world w £ Ad, Ad, w |= 77 implies Ad, w \ = <p. Equivalently this 
can be expressed by saying that <p is true in all models of 77. By strong completeness, 
we have 77 |= p <£=> 77 h <p, for all 77, tp. Further properties of N5 are studied in 

6 We use the term ‘literal’ to denote an atom, or atom prefixed by strong negation. 
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[19]. By adding to N 5 the axiom schema ->-<a — » a we obtain a 3- valued logic, called 
classical logic with strong negation, denoted by N 3 . This logic is complete for total 
models consisting of a single world. It was studied in the first-order case by Gurevich 
[14] and independently in the propositional case by Vakarelov [35] who showed that it 
is equivalent, via a suitable translation, to Lukasiewicz’s 3-valued logic. Tableau calculi 
for both N .5 and N 3 can be found in [28]. For the logic of here-and-there, ie N 5 without 
strong negation, a tableau calculus is presented in [ 1 ]. 

Equilibrium models are special kinds of minimal N 5 Kripke models. We first define 
a partial ordering < on N 5 models. 

Definition 1. Given any two models (77, T ), (H 1 , T'), we set (77, T) < ( 77 ' , T') ifT = 
V and 77 C 77'. 

Definition 2. Let II be a set of N 5 formulas and (77, T) a model of II. 

1. (77, T) is said to be total if H = T. 

2. (77, T) is said to be an equilibrium model of II if it is minimal under < among 
models of II, and it is total. 

In other words a model (77, T) of YI is in equilibrium if it is total and there is no 
model (H',T) of 77 with 77' C 77. Equilibrium logic is the logic determined by the 
equilibrium models of a theory, ie a set of N 5 sentences. A formula ip is said to be 
an equilibrium consequence of a theory 77, in symbols 77 |~ ip, if ip is true in each 
equilibrium model of 77; if 77 has no equilibrium models, we set |~ =|=. Two theories 
77 and 77' are said to be logically equivalent, in symbols 77 = 77' , if they have the 
same (N 5 ) models; by completeness they are therefore inter-derivable. They are said to 
be simply equivalent if they have the same equilibrium models and strongly equivalent, 
in symbols 77 = s 77' , if 77 U £ and 1 1' U E have the same equilibrium models, for any 
set of sentences S. If the latter equivalence holds where S is restricted to being a set of 
atoms, then 77 and 77' are said to be uniform equivalent [10]. 

Equilibrium logic generalises answer set semantics in the following sense. For all 
the usual classes of logic programs, including normal, extended, disjunctive and nested 
programs, equilibrium models correspond to answer sets [27,22]. The ‘translation’ 
from the syntax of programs to N 5 propositional formulas is the trivial one, eg. a ground 
rule of an disjunctive program of the form 

K\ V ... V I Cjc < Li , . . . , L m , notL m -\-\, . . . , notL n , 

where the /,, and Kj are literals, corresponds to the N 5 sentence 

L\ A ... A L m A ~ A ... A —*L n > V ... V 

Proposition 1 ([27, 22]). For any logic program 77, an N 5 model ( T , T) is an equilib- 
rium model of 77 if and only ifT is an answer set of 77. 

In this paper we shall for the most part consider not arbitrary theories but an extension of 
the usual syntax of disjunctive programs. Specifically we allow, besides strong negation. 
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also weak negation to occur in the heads of program rules. So program formulas, also 
called rules, are of the form 

L\ . . . A . . . L m A ~^L m+ 1 A ... A ~>L n —> K\ V . . . V Kj- V V . . . ~*Ki (1) 

We call such programs extended disjunctive programs. They were introduced and stud- 
ied under answer set semantics in [21]. As shown by [23], every nested logic program 
is strongly equivalent to a program of this kind. When needed, we abbreviate rules of 
form (1) in the following way. The set of literals L i, . . . , L m comprising the positive 
body of r is denoted by B + (r) and the set of literals L m+ 1 , . . . , L n comprising the 
negative body of r is denoted by B~(r). Likewise, we set H + (r) = K i, , I\k and 
77“ (r) = Kk+ 1 , . . . ,I\i, for the positive and negative heads, respectively. With a slight 
abuse of notation we can then re-write a rule r of form (1) as 

B + (r) A -iB~ (r) -► 77 + (r) V ^77 ~ (r) (2) 

As mentioned already, N 5 captures strong equivalence between theories, hence between 
logic programs under answer set semantics, as follows. 

Proposition 2 ([22]). Any two theories 77 and II' are strongly equivalent iff they are 
logically equivalent , ie. 77 = s 77' iff 77 = 77' . 

Other characterisations of strong equivalence can be found in [34, 16]; uniform equiva- 
lence is characterised in [10, 32] 

We shall not formalise here the notion of transformation rule (see [4] for a more 
detailed account). For our purposes, any set of operations that can be applied system- 
atically to a program or theory to yield another program or theory can be regarded as 
a transformation rule. In answer set programming one is mainly interested in rules that 
transform a program 77 into a program 77' equivalent to 77 in one of the above senses. 
Let us say that a transformation rule is Nr,- valid if it takes a theory or program 77 to 
a theory or program 77' that is logically equivalent to 77. An immediate corollary of 
Proposition 2 is that a transformation rule preserves strong equivalence if and only if it 
is N. 5 -valid. It is interesting to note that N 5 is a maximal logic with this property. 

4 Program Simplification 

Eiter et al[ 11] consider several syntactic and semantic transformation rules which may 
allow disjunctive logic programs to be simplified under strong or uniform equivalence. 
In other words, by applying such rules one transforms a program 77 to a program 77' 
that is either uniformly or strongly equivalent to 77. As they observe, the rules TAUT, 
RED-, NONMIN, WGPPE, CONTRA and S-IMP preserve strong equivalence. They 
do not mention, however, that each of these rules is easily seen to be valid in the logic 
N 5 , which thus guarantees preservation of strong equivalence. This is essentially the 
method by which 1 25] demonstrates the validity of TAUT, RED- and NONMIN in the 
case of programs without strong negation 7 . 

7 In [25] the authors actually make use of a weaker intermediate logic, that of weak excluded 
middle, WEM. This is an inessential difference because, as [16] have shown, all logics between 
WEM and here-and-there characterise strong equivalence for programs up to the syntax of 
nested expressions. 
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To illustrate how the method works, let us consider the rule S-IMP, due to Wang 
and Zhou [38] and discussed in [11]. As in the case of NONM1N this is a kind of 
subsumption rule allowing one to eliminate a rule that is less specific than another rule 
belonging to the program. As [11] points out, if a rule r of 77 stands in the S-IMP 
relation to a rule r' of 77, then II = s 77\{r'}. Viewed in more logical terms one is here 
applying the principle: 



r \~ r' & r,r' £ II => II = 77\{r'}. 

By definition, r stands in the S-IMP relation to r' , in symbols r > r' , iff there exists 
a set A C B~(r') such that (i) 77 (r) C H(r') U A; (ii) B~(r) C B~(r l )\A; (iii) 
B+(r) C B+(r'). 

Propositions ([11]). The rule S-IMP presents strong equivalence. 

Proof. To verify validity we need only show that if r > r' , then r h r' . By (i) we can 
write r in the form r = B + (r) A ~^B~ (r) — > Hi (r) V H 2 (r) where H\ (r) C H (r') and 
H 2 {r) D H(r') = 0. It follows that H 2 (r) C A. Then clearly r h B + {r) A ~^B~{r) A 
-i H 2 (r) — > Hi[r). Since H 2 [r) C A, we have r h B + (r) A -j B~(r) A ~^A — > H\(r ) 
and so by (ii) also r h B + {r)f\^B~ (r') — * 77i(r). Applying (i) and (iii), strengthening 
the antecedent and weakening the consequent, we have r h B + (r') A -i B~{r ') — * 
H(r'), ie. r h r' . 

Eiter et al [11] also mention the following semantic rule SUPRA of supraclassical- 
ity, discussed by [4]: 

n f= c A => n = e n u {A}, 

where |= c is classical consequence and A is an atom. It is easy to see that this rule is 
not generally valid where say = e is equivalence under stable models or WFS. A simple 
counterexample is the program 77 comprising the single rule ~^a — > a which classically 
derives a. But the stable model and well-founded semantics of 77 U {a} are clearly 
different from those of 77 8 . Thus [11] make no further use of SUPRA; yet this deprives 
them of a powerful tool for simplifying programs. First note that SUPRA is valid for 
answer set semantics in the following restricted form: 

77 \= c — i A 77 = s 77 U {^A}. 

Second, already [26] pointed out that SUPRA is perfectly valid if classical consequence 
is replaced by the weaker consequence relation of intuitionistic logic, or its strong nega- 
tion extension. Nelson’s constructive logic. By the later results of [27] it follows imme- 
diately that SUPRA is valid in the following stronger form: 

n\=ip=>n = nu {ip} hence also 77 |= ip => 77 = s 77 U {ip}. 

Note too that |= is a maximal consequence relation with this property. In this form 
SUPRA can be used as a basis to transform and simplify programs according to a 

8 For ordinary, but not strong, equivalence under stable model semantics SUPRA holds in the 
restricted case that 77 has a stable model. But clearly it cannot be used to simplify a program 
if the existence of a stable model is not guaranteed. 
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straightforward set of derived rules. The key idea is that if // |= p we transform 77 
to a simpler program 77' with the property 

77 = 77' U {</?}. 

We consider two kinds of rules, which we denote by DEC + and DEC , according to 
whether the formula p is a literal L or its negation ~L. In each case we let L range over 
literals and we consider the transformation of 77 to 77'. We denote the complement of a 
literal L by L* (ie for an atom A, A* = ~A, (~A)* = A). We assume that an extended 
disjunctive program 77 is grounded and r ranges over rules in 77. 

DEC + . Suppose that 77 h L. (a) Let 

E = {r : L* G B+(r)} U {r : ~^L* G 77(r)} U {r : ->L G B(r)} U{r:iG 77(r)}. 
Set 77' = II\E U {L}. 

(b) 77' can be further simplified by removing from it all other remaining occurrences 
of 7 in the bodies of its rules, all occurrences of ~L in heads as well as all occurrences 
of 77* and ~^L*. 

DEC”. Suppose 77 I — > L (but 77 \f ~ L - note that the case that 77 derives —L and the 
complement of L falls under case DEC + ). (a) Let 

E = {r G 77 : L G B + (r)} U {r G 77 r -,L G 77(r)}. 

Set 77' = 77\i7 U {L — > ~L}. Note that adding L — > ~L is the same as adding the 
integrity constraint 4 — L or the formula ->L, one can choose one or other according to 
preferred syntax. 

(b) 77' can be further simplified by deleting from it all remaining occurrences of 
-7 except for the newly added constraint and all positive occurrences of L (not in the 
scope of ~). 

Proposition 4. Applications of each of the transformations DEC + (aj,(7j, 
DEC ~(a),(b) preserve strong equivalence. In particular we have 77 = 77' = s 77. 

Proof Assume the pre-condition of DEC . Suppose r is a rule such that L* G B + (r). 
Since for any C, we have L h -4 C, we must also have L\-L*/\tp-^C\/tp 
for arbitrary p, 'if so L I r. Suppose on the other hand that —P appears in the body 

of a rule r. Again by the N 5 axioms, L I <L —>■ C for any C, so by strengthening 

the antecedent and weakening the consequent, clearly L h r. Now, if r contains an 
occurrence of L in its head, then immediately L\- r. Similarly, since L I — >7*. for any 
any rule r with an occurrence of - //" in its head, we also have L h r. So in each of the 
four cases r can be eliminated and 77 = 77'. 

Now consider any rule of 77' that contains an occurrence of L in its body. The only 
possibility is that L G B + (r) . Then r is of the form L A <p — » and it is clear that if r' 
is the result of deleting L from r, then L U r h r' and r' h r. Likewise, if L* G B~ (r) 
then r must be of the form A ip —* if), and the same argument applies. So any 
remaining occurrences of - 1 L* can be eliminated. Now, suppose r contains ->L in its 
head, ie is of the form p — > ~^L V f>, and let r' be the corresponding rule p if where 
L is eliminated. Then clearly /hr and r U L h r' by applying disjunctive syllogism. 
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Since L* I L, the same argument applies to any remaining occurence of L* which 

must be in the (positive) head of a rule r. 

Assume that the condition of DEC - holds and let r be a rule such that L £ B + (r). 

For arbitrary C we have ->7, I i->L — + C, hence ->L h L — > C, hence h L A </? — > 

C V ip, for any ip, ip. On the other hand, let r be such that ->L £ H(r). ->L h r 
follows immediately. So by construction 77' = II. Clearly the only possible remaining 
occurrences of ->L are in the bodies of rules r, ie where r is of the form ~^L A tp — > ip. 
Let r' = ip — > ip. Then clearly r'hr and r U -<L h r'. Now consider any occurrence 
of L in a rule r that is not in the scope of Then clearly L £ H(r ) and r has the 
form p — > LV ip. Deleting L to form the rule r 1 = p ip, we see that r' h r and 
r U -iL h r' . □ 

Rules DEC+(a) and DEC ' (a) correspond to what in [11] is called removing 
redundant rules (once simple additions have been made to the program), while rules 
DEC + (b) and DEC~(b) correspond to rule simplification or condensation, where lit- 
erals are removed from a given rule. Notice that the condition for applying DEC - , 

that a literal L is decided negatively, ie. II I <L, holds if and only if - L is classically 

derivable from 77, or more correctly if 77 |=n 3 ~>L. 

The following example shows how the above rules may be applied in practice. Con- 
sider the program 77 consisting of the following numbered rules: 



rl. c V —<d 

r3. b — > a 

r5. ->a — » e 

rl. a A / -> g 

r9. ~^d A -<g — » / V ~>e 

rll. h Ac^ j. 



r2. ~a 

r4. e A ~^b — * d V ->c 
r6. cAd—>a 
r8. -ic A -■/ — > g V a 
rlO. h — > -i<7 



It is natural to start by applying transformation rule DEC + (a) to any literal that holds 
as a fact in the program. In this case the literal ~a. This produces the following simpli- 
fications. Rule r5 becomes just e. r6 becomes 



c Ad 



_L. 



r7 is deleted, and r8 becomes 

—'C A ->f — > 5 . 

Next we apply the transformation using the fact that e is derivable. Rule r4 now simpli- 
fies to 

~^b — > d V ->c 

and r9 simplifies to A -ig — > /. Since no further literals have emerged as facts we 
may now check whether some weakly negated literal is derivable. It is easily seen that 
rules r2, r3 yield that —I is derivable. This further simplifies r4 to d V —>c. So at this 
stage 77' looks as follows. 



rl. 


c V -id 


r2. 


~a 


r3. 


b-^~b 


r4. 


d V —>c 


r5. 


e 


r6. 


c A d — > _L 


r8. 


-, c A ->/ — > g 


r9. 


—<d A ~^g — > 


rlO. 


h — > —id 


rll. 


h A c — > j 
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In this program, rl,r4,r6 resolve to — <c, ->d. So applying DEC to these formulas, 
the program now becomes 



~a e 

b — > c — > ~c 

-<9 f 

whose answer sets are readily seen to be {^a, e, /} and {~a, e, g}. 

5 Decidable, Stable and Safe Literals 

In this section we explore some further uses for Ns-derivability. For any program II a 
literal L is said to be decidable if either of the pre-conditions of DEC + or DEC ob- 
tain, in other words if either II h L or II I — L, otherwise L is said to be undecidable. 
Let Und(II) be the set of undecidable literals L in the language of II and let Neg(II) 
be the set of literals L such that — - L appears in some rule in II other than the degenerate 
rule -i L. Then we have shown in particular the following: 

Corollary 1 ([27]). 9 Any program 77 is strongly equivalent to a program 11' such that 
Neg(n') C Und(n'); in other words a program in which all decidable literals in 
Neg(II) are eliminated. 

Proof. Apply rules DEC + and DEC to all decidable literals to form a strongly equiv- 
alent program 77'. Let L be any decidable literal such that — 7 appears in some rule r in 
77. Rule DEC + (a) deletes r if ->L £ B{r) and DEC + (b) deletes an occurrence of —>L 
if -<L € 77 (r). On the other hand, rule DEC” (a) deletes a rule r such that ->L £ 77 (r) 
and DEC”(b) deletes any remaining occurrence of ->L. Hence all decidable occur- 
rences of 7 that appear in the scope of -i are eliminated. □ 

We have seen that 7 might be decidable (negatively) without ~ L being decidable. 
This is precisely case DEC” . In this case although we eliminate occurrences of —L and 
positive occurrences of 7, the method does not license us to eliminate any remaining 
occurrences of L in the scope of strong negation, nor equivalently occurrences of L* in 
the case that 7 is already a negative literal. 

The set Neg(II) of the weakly negated literals of a program clearly plays a fun- 
damental role in answer set semantics. The above corollary ensures that by monotonic 
means we can essentially eliminate all occurrences of negation except those that are in 
a sense genuinely ‘nonmonotonic’ . In fact, in the limit where all literals in Neg(II) are 
decidable, 77 behaves purely monotonically, as the following shows: 

Propositions. Suppose that each literal L £ N eg{ I / j is decidable. Then for all p, 
77 |~ ip <=> 77 |= ip. 

A different but related notion is that of stability, a concept due to van Dantzig [7], 
see also Dummett [9] . Originally introduced in the context of intuitionistic mathematics, 
it is applicable to any superintuitionistic logic (we state it for N 5 ): 

9 In [27] this result is stated without proof for the case of ordinary disjunctive programs and 
ordinary equivalence. 
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Definition 3. A formula p is said to be stable in a theory 77 if 77 I — < —up —> <p. 

Obviously a decidable literal is stable, but the converse need not hold. While the de- 
cidable literals in a program are relevant for determining the composition of its answer 
sets, the stable literals are relevant for guaranteeing the existence of answer sets. Again 
what is important here is not the collection of all literals of the program but those that 
are prefixed by weak negation. 

Proposition 6. A consistent program 77 has an answer set if each literal in N eg[ll ) is 
stable in 77. 

Proof. Assume that each L £ Neg(II) is stable. We will show that 77 has an equi- 
librium model. Suppose not. Then for any /-minimal total model (T, T) of 77, there 
is an 77 C T such that (77, T) |= 77. So for each rule r £ 77 of form (1), we have 
(77, T) |= r. In particular by the truth conditions for r at world h, this means 

B + {r) C 77 & B~(r)r\T = 0 => 77+(r) n 77 f 0 or H~(r)%T 

By assumption each weakly negated literal L in r is stable, so (77, T) |= -i—iL — > L. 
Therefore L £ 77 •/=>■ L £ T for each L £ Neg(II). So the above truth condition can 
be re-written as 

B + (r) C 77 & 77“ (r) n 77 = 0 => 77 + (r) Cl H f (/) or H~{r)%H 

But this implies that (77,77) |= r, which is impossible by the minimality of (T,T). 
This contradicts the initial assumption. □ 

It is easy to see that the condition of stability is quite different from purely syntactic 
conditions such as signings, stratifications etc. The reader may readily construct simple 
examples that satisfy the condition of Proposition 6 but which do not possess signings 
or stratifications. Proposition 6 provides an unexpected link between two historically 
distinct and apparently quite independent concepts of stability. Not surprisingly, we 
cannot strengthen Proposition 6 so as to provide also a necessary condition for the ex- 
istence of an answer set. Any attempt is bound to fail on purely complexity-theoretic 
grounds. Deciding whether a literal is stable is a co-NP complete problem, while de- 
ciding whether a nested or disjunctive logic program has an answer set is Ilf ■ -complete, 
[12,29,30], 

Any literal L that is decidable positively in a program If ie II h L, not only 
belongs to every answer set of 77 but enjoys the stronger property of belonging to every 
answer set of any extension of 77 (providing it has an answer set). We may call such 
a literal safe for 77, since it cannot be defeated by extending 77 in any consistent way. 
Conversely we may call a literal defeasible if in some extension of 77 it is no longer 
nonmonotonically derivable. The following definition makes this precise. 

Definition 4. Let p be a formula such that 77 |~ p. p is said to be safe /or 77 if for any 
S, 77 U E [~ p, whenever 77 U S has an equilibrium model or answer set; otherwise p 
is said to be defeasible in 77. 

Now, in order to be safe a literal L need not be monotonically derivable from 77 in N 5 . 
In fact it is easy to see that it suffices that 77 I ~L, since then — L must be derivable 




Simplifying Logic Programs Under Answer Set Semantics 



221 



in every extension of 77 and so 7 must be true in any answer sets possessed by such an 

extension. So 7 is safe in 77 if 77 I —L (or equivalently 77 Ln 3 7). Further reflection 

shows that this condition is also necessary, as the following makes clear (we state it for 
arbitrary theories in equilibrium logic): 

Proposition 7. Let 77 be any theory possessing an equilibrium model. A literal L is 
safe for 77 iff 77 I — i-i 7. 

Proof. It remains to check necessity. Thus, suppose that 77 I / ->-i L. We need to show 
that there is an extension £ of 77 such that £ \f L. By the assumption 77 U {^^^7} 
(= 77 U {— >7}) is consistent so it has an Ns-model say (77, 7) in which 7 is false, ie 
7 T . Clearly also (T, T) |= II and is an equilibrium model of 77 U {7}. Hence there 

is an extension 77 U {7} of 77 such that 77 U {7} \f 7. □ 

To illustrate the above concepts, consider the program 77 comprising the following 
rules: 

~a; — > a; ->c — * b; -<d — » e; -ig — > p; q — > ->p 

whose single answer set is {~a, 6 , e,p}. Evidently ~a is decidable (hence stable and 
safe). It is easy to check that p is stable but is undecidable and defeasible, b is safe but 
unstable, while e is defeasible and unstable. 

6 Some Complexity and Implementation Issues 

Complexity results for reasoning in ASP are well-discussed in the literature. For dis- 
junctive logic programs (with strong negation) deciding whether a program has an 
answer set is, as mentioned above, £% -complete, [12]. Deciding for some <p whether 
77 Y~up is //('-complete. The corresponding complexity classes for programs with nested 
expressions [23] and for propositional theories in equilibrium logic are the same, [29, 
30]. So complexity does not alter when the syntax is extended in this way. We already 
observed that each nested logic program is strongly equivalent to an extended disjunc- 
tive program, as shown in [23]. The transformation that performs this reduction is not 
generally polynomial, however [31 ]. In [31] a polynomial reduction of this kind is pre- 
sented, where however new atoms are added to the language. This method has been 
implemented in a compiler, nip, publicly available as a front-end to the disjunctive 
logic programming system DLV 10 . In [17] it was shown how extended disjunctive pro- 
grams can be polynomially reduced to ordinary disjunctive programs, again using a 
similar extension of the language with new atoms. 

Deciding whether a formula tp is an N 5 -consequence of a theory 77 is of lower 
complexity: this problem is coNP-complete [29,30], see also [24]. Therefore the pre- 
conditions for applying the rules DEC and DEC” are of this complexity, as are the 
problems of checking whether a literal is stable or safe. Clearly, once the pre-condition 
of a rule has been verified for some literal, the algorithm executing the rule is linear for 
that literal, hence quadratic if repeated for all literals. As noted, a tableaux system for 
checking N 5 -consequence was presented in [29]. Since this system can also be used 
to the verify strong equivalence of programs, a workable implementation is desirable. 
Currently a prototype implementation is under construction at the University of Malaga. 

10 See http://www.cs.uni-potsdam.de/torsten/nlp 
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7 Related Work and Conclusions 

A standard reference for the study of transformation rules in logic programs with default 
negation and disjunction is the work of Brass and Dix [4] who consider the behaviour 
of a number of transformation rules with respect to the main types of logic program- 
ming semantics. Brass and Dix are concerned mainly with simplifying rules preserving 
(ordinary) equivalence of programs; in some cases the rules are strong enough actually 
to compute the semantics of a program. In [27] it was observed for the first time that 
a nonclassical logic, that of here-and-there with strong negation, N 5 , could be useful 
to discover or check the validity of transformation rules preserving equivalence under 
answer set semantics; the present paper pursues this idea further. We already noted that 
some simplification rules for ASP were discussed by [25] making use of the weaker su- 
perintutionistic logic of weak excluded middle. [25] also consider extended disjunctive 
programs, but without the presence of strong negation. In [11] Eiter et al discuss sim- 
plifying rules preserving strong equivalence as well as some rules preserving uniform 
equivalence (in special cases). However, while they employ a chacterisation of strong 
equivalence essentially equivalent to Proposition 2 above, they make no further use of 
Ns-inference as a means to test the validity of existing rules and search for new ones. 
The logic N 5 without strong negation (ie the logic of here-and-there) is used in [31] to 
verify strong equivalence preserving rules. However the main aim there is to show how 
arbitrary nested programs can be reduced to disjunctive ones. 

In this paper, we have considered several new ways in which inference in the non- 
classical logic N 5 of here-and-there with strong negation can be useful for program 
simplification under answer set semantics. First, for any kind of program it provides a 
simple check to verify whether a putative transformation rule preserves strong equiva- 
lence in answer set semantics or equilibrium logic, as illustrated here for the rule S-IMP. 
Secondly, we discussed two new transformation rules DEC + and DEC” based on 
N 5 inference that can be used to simplify extended disjunctive programs. One feature, 
in particular, is that they eliminate all decidable occurrences of literals in the scope of 
weak negation (negation-as-failure). Third, we showed how varying the kinds of formu- 
las to be tested for Ns-derivability from a program can provide further information that 
is useful from a computational point of view: for example we may carry out a partial 
check for the existence of an answer set as well as distinguish between the defeasible 
and non-defeasible literals in a program. 

Several lines of work lay open for future investigation. First, it seems that further 
study of inference in N 5 is likely to result in discovering new ways of simplifying pro- 
grams in ASP. Secondly, since uniform equivalence is also characterised by a simple 
structural condition on N^-models ([10,32]), we may hope to use this to extend the 
work of [1 1 ] on transformation rules preserving uniform equivalence. Thirdly, once an 
efficient implementation of an N 5 theorem prover is available, we would like to exper- 
iment with adding this, extended with an implementation of the DEC+ and DEC” 
rules, as a pre-processor to systems such as DLV; specifically as a module fitting be- 
tween the program grounding and the model generation processes. Fourthly, and more 
ambitiously, it would be desirable to have a full first-order version of equilibrium logic, 
based on a suitable first-order version of the logic N 5 and which is sound and com- 
plete with respect to current ASP solvers. In this case one could search for simplifying 
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transformation rules that apply prior to grounding and perhaps thereby achieve more 
extensive efficiency gains. 
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Abstract. We define the class of head-cycle free nested logic programs, and its 
proper subclass of acyclic nested programs, generalising similar classes originally 
defined for disjunctive logic programs. We then extend several results known for 
acyclic and head-cycle free disjunctive programs under the stable-model seman- 
tics to the nested case. Most notably, we provide a propositional semantics for 
the program classes under consideration. This generalises different extensions 
of Fages' theorem, including a recent result by Erdem and Lifschitz for tight 
logic programs. We further show that, based on a shifting method, head-cycle free 
nested programs can be rewritten into normal programs in polynomial time and 
space, extending a similar technique for head-cycle free disjunctive programs. All 
this shows that head-cycle free nested programs constitute a subclass of nested 
programs possessing a lower computational complexity than arbitrary nested pro- 
grams, providing the polynomial hierarchy does not collapse. 



1 Introduction 

This paper deals with generalisations and refinements of several reducibility results for 
nested logic programs (NLPs) under the stable-model semantics. This class of programs 
is characterised by the condition that arbitrarily nested formulas, formed from atoms us- 
ing negation as failure, conjunction, and disjunction, serve as bodies and heads of rules, 
extending the well-known classes of normal logic programs (nLPs), disjunctive logic 
programs (DLPs), and generalised disjunctive logic programs (GDLPs). Nested logic 
programs under the stable-model semantics (or rather under the answer-set semantics, 
by allowing also strong negation) were introduced by Lifschitz, Tang, and Turner [18], 
and currently receive increasing interest in the literature, both from a logical as well as 
from a computational point of view. 

In complexity theory, a frontier is identified having DLPs, GDLPs and NLPs on 
the one side, and nLPs and so-called nested normal programs (NnLPs), for which only 

* This work was partially supported by the Austrian Science Fund (FWF) under projects Z29- 
N04 and P15068-INF, by the German Science Foundation (DFG) under grants FOR 375/1 and 
SCHA 550/6, TP C, as well as by the European Commission under projects FET-2001-37004 
WASP and IST-2001-33570 INFOMIX. 



B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 225-239, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 




226 



Thomas Linke. Hans Tompits, and Stefan Woltran 



positive literals are allowed as heads of rules (cf. Table 1 below), on the other side. 
For the former program classes, the main reasoning tasks lie at the second level of the 
polynomial hierarchy [9, 26], while for the latter classes, the main reasoning tasks have 
NP complexity [24, 2] 1 . There are various translatability results between the different 
syntactic subclasses of programs. Among them, there are translations between nested 
programs and GDLPs [18], and between DLPs and nLPs [8], both requiring exponential 
space in the worst case. Additionally, there exist linear-time constructible translations 
between NLPs and DLPs [25], and between GDLPs and DLPs [ 14, 15]. Note that, un- 
less the polynomial hierarchy collapses, the above mentioned complexity gap does not 
allow for polynomial translations between, e.g., nested logic programs and normal logic 
programs. However, one can seek for subclasses of NLPs where such a translation is 
possible. 

In this paper, we identify non-trivial subclasses of nested programs for which we 
establish two forms of reductions: 

1. reductions to classical propositional logic; and 

2. reductions to normal logic programs. 

More specifically, we introduce the classes of head-cycle free (HCF) nested programs 
and its proper subclass of acyclic nested programs. Both program classes are defined 
as generalisations of similar kinds of programs originally introduced as syntactic sub- 
classes of disjunctive logic programs a decade ago by Ben-Eliyahu and Dechter [1]. 
Moreover, the reductions we provide here are, on the one hand, extensions of previous 
results, established for more restricted kinds of programs, and, on the other hand, with 
respect to head-cycle free and acyclic nested programs, optimisations of general trans- 
latability results developed in [26,25]. We detail the main aspects of our results in the 
following. 

Concerning the reduction to classical propositional logic, we construct mappings, 
T\-\ and T* [•], assigning to each program a propositional theory such that 

1 . given an acyclic nested program 77, the stable models of 77 are given by the models 
of the classical theory T[77] ; and 

2. given a head-cycle free nested program 77, the stable models of 77 are given by sets 
of form 7 fl V, where 7 is a model of the classical theory T* [77] and V is the set of 
atoms occurring in 77. 

In both cases, the size of the assigned classical theory is polynomial in the size of 
the input program. Moreover, the translation T* [•] is defined using newly introduced 
auxiliary variables, whereas for T[-], no new variables are required. 

These results are generalisations of similar characterisations given by Ben-Eliyahu 
and Dechter [1] for acyclic and head-cycle free DLPs. Moreover, our results generalise 
results relating the stable-model semantics to Clark’s completion [5]. Recall that Clark’s 
completion was one of the first semantics proposed for programs containing default 
negation, in which a normal logic program 77 is associated with a propositional theory, 
COMP[77], called the completion of 77. Although every stable model of 77 is also a 

1 The NP-completeness for NnLPs can be derived from a translation from NnLPs to nLPs due 
to You, Yuan, and Zhang [29], 
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model of the completion of 77, the converse does not always hold. In fact, Fages [12] 
showed that the converse holds providing II satisfies certain syntactic restrictions. Our 
results generalise Fages’ characterisation in the sense that, if II is a normal program, 
then both T[77] and T* [77] coincide with COMP [77]. 

Fages’ theorem was generalised in several directions: Erdem and Lifschitz [10, 1 1 ] 
extended it for NnLPs, providing the given programs are tight. We extend the notion of 
tightness to arbitrary nested programs and refine our results by showing that 

if a nested program 77 is HCF and tight on an interpretation 7. then 7 is a stable 

model of 77 iff 7 is a model of T[77]. 

Other generalisations of Fages’ theorem drop the syntactic proviso but add instead ad- 
ditional so-called loop formulas guaranteeing equivalence between the stable models of 
the given program and the classical models of the resultant theory. This idea was pur- 
sued by Lin and Zhao [19] for normal programs and subsequently extended by Lee and 
Lifschitz [17] for disjunctive programs with nested formulas in rule bodies. In contrast 
to Clark’s completion for normal programs, the size of the resultant theories in these ap- 
proaches is in the worst case exponential in the size of the input programs. We further 
note that, for the sort of programs dealt with in [17], the notion of completion defined 
there coincides with our transformation T[-]. 

The reductions to classical propositional logic allow us also to draw immediate 
complexity results for acyclic and HCF nested programs. As noted above, the main 
reasoning tasks associated with arbitrary nested programs lie at the second level of the 
polynomial hierarchy [26], whereas our current results imply that analogous tasks for 
acyclic and HCF nested programs have NP or co-NP complexity (depending on the 
specific reasoning task). Thus, providing the polynomial hierarchy does not collapse, 
acyclic and HCF programs are computationally simpler than arbitrary nested programs. 

Let us now turn to our results concerning the reductions to normal logic programs. 
As was shown by Ben-Eliyahu and Dechter [1], HCF disjunctive programs can be 
transformed into equivalent normal programs by shifting head atoms into the body 
(cf. also [6, 13]). For instance, a rule of form p V q <— r is replaced by this method 
by the two rules p <— r,~>q and q <— r, ~^p (where “-i” denotes the negation-as-failure 
operator). We generalise this method for HCF nested programs, obtaining a polynomial 
reduction from HCF nested programs (and thus, in particular, also from acyclic nested 
programs) into nLPs. Note that applying such a shifting technique for programs which 
are not HCF does in general not retain the stable models. 

Previous to our work, Inoue and Sakama [ 14] already defined the notions of acyclic- 
ity and head-cycle freeness for generalised disjunctive programs, extending the respec- 
tive notions introduced in [1], They showed that GDLPs satisfying either of these ex- 
tended notions can likewise be transformed to nLPs by shifting head atoms to the bodies 
of rules and thus have the same worst-case complexity as normal programs. However, 
their notions of acyclicity and head-cycle freeness are more restrictive than ours, with 
respect to GDLPs, and hence our results hold for a larger class of programs. 

We finally note that, at first glance, one may attempt to construct a polynomial 
translation from acyclic or head-cycle free NLPs into normal programs or into classical 
propositional logic by first applying the polynomial translation from NLPs to DLPs 
due to Pearce et al. [25], and afterwards by transforming the resultant programs into 
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either normal programs or into classical logic using the results of Ben-Eliyahu and 
Dechter [1], However, since the translation of [25] does neither preserve acyclicity nor 
head-cycle freeness, such an approach does not work in general. 

The paper is organised as follows. The next section supplies some background on 
logic programs and the stable-model semantics. In Section 3, we introduce acyclic and 
head-cycle free nested programs and show some invariance theorems. Section 4 dis- 
cusses the reductions to classical propositional logic, and Section 5 deals with the gen- 
eralised shifting technique for reducing HCF programs into nLPs. Section 6 is devoted 
to tight nested programs, and Section 7 concludes the paper. 

2 Preliminaries 

We deal with propositional languages and use the logical symbols T, _L, V, A, — >, 
and to construct formulas over propositional variables (or atoms) in the usual way. 
A formula using only A , V , or -> as its sentential connectives is called an expression. 
Literals are formulas of form v or ->v, where v is some variable, or one of T. _L. We 
refer to a literal of form v (where v is as before) as a positive literal and to a literal of 
form -ii> as a negative literal. Disjunctions of form V ieI 4>i are assumed to stand for the 
logical constant _L whenever 7 = 0, and, likewise, conjunctions of form f\ ieI fi with 
7 = 0 stand for T. The set of all atoms occurring in a formula (f> is denoted by Atm(<j>). 

By an interpretation, 7, we understand a set of variables. Informally, a variable v is 
true under 7 iff v £ 7. Interpretations induce truth values (in the sense of classical logic) 
of arbitrary formulas in the usual way. The set of models of a formula cf> is denoted by 
Mod ((/)). Two formulas, </> and t/i, are (logically) equivalent , iff Mod((f>) = Modff). 
For a set V and a family of sets S, by Sjy we denote the family {7 (T V \ I £ S}. 

The fundamental objects of our investigation are nested logic programs (NLPs), 
introduced by Lifschitz, Tang, and Turner [18]. NLPs are characterised by the condition 
that the bodies and heads of rules are given by arbitrary expressions as defined above. 
For reasons of simplicity, we deal here only with languages containing one kind of 
negation, corresponding to default negation. Therefore, — refers to default negation, 
whenever used in logic programs. 

In more formal terms, a rule , r, is a pair of form 

77(r) <- B{r), 

where B(r) and 77(r) are expressions. B (r) is called the body of r and T7(r) is the 
head of r. If B(r) = T, then r is a fact, and if T7(r) = _L, then r is a constraint. A 
nested logic program, or simply a program, is a finite set of rules. 

We associate to every program 77 the propositional formula 

c(77) = /\ (B(r) - 77(r)). 

ran 

Furthermore, Atm(II) denotes the set of all atoms occurring in program 77. 

Let 77 be a program and 7 an interpretation. Then, the reduct, II 1 , of 77 with respect 
to I is obtained from 77 by replacing every occurrence of an expression -> in 77 which 
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Table 1. Classes of programs. 



Class 


Heads 


Bodies 


NLP 


expression 


expression 


GDLP 


disjunction of literals 


conjunction of literals 


DLP 


disjunction of atoms 


conjunction of literals 


NnLP 


positive literal 


expression 


nLP 


positive literal 


conjunction of literals 



is not in the scope of any other negation by _L if is true under 7, and by T otherwise. 
7 is an answer set (or stable model) of II iff it is a minimal model (with respect to set 
inclusion) of dll 1 ). The collection of all answer sets of 77 is denoted by .4, S' (77). Two 
logic programs, 77i and 7?2, are equivalent iff AS(IIi) = AS^T^). 

By restricting the syntactic form of bodies and heads of rules, different classes of 
programs are identified. Besides NLPs, for our purposes, the following classes are of 
interest: generalised disjunctive logic programs (GDLPs), disjunctive logic programs 
(DLPs), nested normal logic programs (NnLPs), and normal logic programs (nLPs). 
Table 1 summarises the defining attributes of these classes. 

Following Lloyd and Topor [23] (cf. also [11]), we define the completion of an 
NnLP 77 as the propositional formula 

COMP[77] = /\ [p ~ \/ B{r)), 

p£A rEn,H(r)—p 



where A = Atm(II) U {_L}. 

Finally, we recall some graph-theoretical notations. A ( directed ) graph, G, is a pair 
(V) E ) such that V is a finite set of nodes and E C V x V is a set of edges. A path from 
v to v' in G is a sequence P v y = (iq, . . . , v n ) of nodes such that v = v\, v' = v n , 
and (vi , v-i+i) € E, for each 1 < i < n. A graph G = (V, E) is acyclic iff, for each 
node v £ V, there is no path from v to itself. A strongly connected component (or 
component, for short) of a graph G is a maximal set S of nodes such that, for any two 
nodes p and q in S, there is a path from p to q in G. Strongly connected components 
can be identified in linear time [28]. The size of a component is the length (i.e., number 
of edges) of the longest acyclic path in it. 

3 Acyclic and Head- Cycle Free Nested Programs 

We start our formal elaboration by recalling the notion of a dependency graph for nested 
logic programs. Based on this, we define acyclic and head-cycle free nested programs, 
and show that these notions are invariant with respect to rewritings into DLPs. 

We commence with the following auxiliary notions. 

Definition 1. Let p be an atom and ip an expression. Then, the polarity of a specific 
occurrence of p in ip is positive iff it is not in the scope of a negation, and negative 
otherwise. The set of atoms having a positive occurrence in ip is denoted by At.m + (tp). 
For a program 77, we define Atm + (77) = {J ren (Atm + (77 (r)) U Atm + (B(r))). 
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With this notation at hand, we are able to define the concept of dependency graphs 
for NLPs, following Lee and Lifschitz [17]. 

Definition 2. The (positive) dependency graph of a program 77 is given by Gn = 
(Atm(II), En), where En C Atm(Il) x Atm(II) is defined by the condition that 
(p,q) £ En iff there is some r £ 77 suchthatp £ Atm + (B(r)) andq £ Atm + (77 (r)). 

Our first category of programs is then introduced as follows: 

Definition 3. A nested logic program 77 is acyclic iff its dependency graph Gn is 
acyclic. 

ft is a straightforward matter to check that this definition generalises acyclic DLPs 
as introduced by Ben-Eliyahu and Dechter [1], i.e., a DLP 77 is acyclic in the sense of 
Definition 3 iff it is acyclic in the sense of [1], 

Example 1. Consider the following two programs: 

III = {P V q <—',p <— q\q+-p}\ 77 2 = {p V q*-\p <- ^q;q<- ^p}. 

Programs 77i and 77 2 have dependency graphs Gn x = ({p, q}, {( p,q ), (q,p)}) and 
Gn 2 = 0 ), respectively. Thus, 77i is not acyclic, whereas 77 2 is. One may 

verify that both programs have the same stable models, viz. AS(IIi) = AS(nf) = 
{{f7 <?}}• 

Next, we generalise the notion of a head-cycle free DLP to the class of NLPs. To 
this end, we need the following definition. 

Definition 4. Two distinct atoms, p and q, are joint-positive in an expression (j> iff there 
exists a subformula fix V </> 2 of f with p £ Atm + ((f>i) and q £ Atm + ((fa), or vice 
versa. Moreover, p and q are called head-sharing in a program 77 iff p and q are joint- 
positive in 77 (r), for some r £ 77. 

From this, the class of head-cycle free NLPs is characterised in the following way: 

Definition 5. A nested program 77 is head-cycle free ( HCF ) iff its dependency graph 
Gn does not contain a directed cycle going through two head-sharing atoms in 77. 

Again, it can be shown that a DLP 77 is HCF in the above sense iff it is HCF in the 
sense of [ 1 ] . Thus, the class of HCF NLPs is a proper generalisation of the class of HCF 
DLPs. Furthermore, it is easy to see that every acyclic NLP is HCF. 

Example 2. Consider the programs 77i and 77 2 from Example 1 . Observe that p and q 
are head-sharing in both III and 77 2 . Hence, 77i is not HCF, since there is a cycle in 
Gn x involving p and q. On the other hand, 77 2 is HCF, and, as we already know from 
the above, acyclic. 

In what follows, we review the translations introduced by Lifschitz, Tang, and 
Turner [18] and Janhunen [15], which, jointly applied, allow for translating any nested 
program into a DLP via the substitutions (L1)-(L12) and (J) from Table 2. Observe that 
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Table 2. Replacements in logic programs. It is assumed that <p, ip, and p are expressions, A is a 
set of atoms, p is an atom, L and L p are new atoms, and o £ {A, V}. 



Name 


Occurrence a 


Replaced by (3 


(LI) 


cp o ip 


ip o <p 


(L2) 


(rp o ip) o p 


<p o (ip o ip) o £ {A, V} 


(L3) 


(p o (p 


<t> 


(L4) 


((p A ip) V p 


(4> V <p) A (ip V <p) 


(L5) 


(4> V ip) A p 


(cp A <p) V (ip A <p) 


(L6) 


-,(<p V ip) 


(-1 (p A -iip) 


(L7) 


~’(0 A ip) 


(-1 (p V -i ip) 


(L8) 




-n<P 


(L9) 


(p <— ip V p 


cp <- ip; f <- p 


(L10) 


(p A ip <— p 


<p <- ip- ip 4 - p 


(LI 1) 


(p <— ip A -i-i p 


<p V -ii p <— ip 


(LI 2) 


-i-i (p V ip <— p 


ip <— ip A -i (p 


(J) 


-i p V ip <— (p 


L p V ip <— </>; _L <— p A L p - L p <- - -i p 


(S) 1 


(p V ip <— <p 


(p *— p A -up-, ip <— ip A -i (p 


(T*) 2 


<p^tp 


{(p[(A\{p})/T]^ip\p/±y, <p]p/T] <- ip[(A \ {p})/-L] | P G A} 


(D) 


(p V ip <— <p 


(p \J L *— <p\ ip *— L 


(Yl) 


p <— ip A (f V ip) 


p<— ip A L-, L <— 0V p 


(Y2) 


p <— ip A -i-i q\ 


p <— ip A -i L; L <— -iq 


(C) 


(p A ip <— <p 


L <— p- (p <— L; ip <— L 



1 applicable only if Atm + (rp) n Atm + (ip) = 0. 

2 applicable only for non-empty A C Atm + ((f>) 0 Atm + (ip). 



the DLPs obtained in this way may be exponential in the size of the respective input 
programs. Our goal is to show that our notions of acyclicity and head-cycle freeness for 
NLPs are invariant with respect to the applications of these substitutions. 

Any substitution u from Table 2 is applied as follows: We say that a program II' is 
obtained from II via a by replacing 2 an occurrence of an expression, or a single rule a, 
by /3, which itself is an expression, a rule, or a set of rules. Moreover, 9[p/x\ denotes the 
replacement of all positive occurrences of an atom p in 9 by expression x. Accordingly, 
for a set of atoms S , 9\S/x\ denotes the replacement of all positive occurrences of 
p £ S in 9 by x. Thus, 9[S/x] = 9 whenever S fl Atm + (9) is empty. We sometimes 
use a substitution a in the reverse way, i.e., replacing (3 by a. This is made explicit by 
writing crY We note that in this section we require only a part of the translations given 
in Table 2; the remaining ones are needed later on. 

We start with the translation from NLPs to GDLPs due to Lifschitz, Tang, and 
Turner [18]. This translation is based on substitutions (LI )— (L12) from Table 2. 

Proposition 1 ([18]). Let II be a nested program. 

Then, for every program II' obtained from II via any substitution from ( L1)-(L12 ), 
it holds that AS (II) = AS (II'). Moreover, there is a program II" obtained from 77 

2 For (Lllj, (L12), (J), (Y 1 ), and (Y2), we allow the component ip in a to be vacuous; in this 
case, for (3, ip is set to T in (LI 1 ) and to _L in (L12). 
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via a sequence of substitutions from (L1)-(L12) such that II" is a GDLP and AS (II) = 

AS(n"). 

Next, we close the gap between GDLPs and DLPs. To this end, we require the 
substitution rule (J) of Table 2, which is a generalised stepwise variant of a labeling 
technique discussed in [15], introducing a globally new atom L p , for any atom p. Ob- 
serve that acyclicity and head-cycle freeness are invariant with respect to applications 
of substitutions (L1)-(L12) and (J). More precisely, (L1)-(L12) preserve both the de- 
pendency graph and the pairs of head-sharing atoms. An application of Substitution (J), 
however, changes the dependency graph Gn — (Atm(II),Ejj), for a given program 
77, to G' n = (Atm(II) U {L p }, En U {(q,L p ) \ q G Atm + ((j))}) and yields addi- 
tional pairs q and L p of head-sharing atoms, for any q G Atm + (ip) (cf. Table 2), but no 
additional cycles are introduced in G' n . This gives us the desired result. 

Theorem 1. Let LI be a nested program , and let II' be obtained from II by applying 
any sequence of substitutions from ( L1)-(L12 ) and (J). 

Then, the following properties hold: 

1. AS(n) = AS(II')\ Mm(n) ; 

2. LI is acyclic iff II ' is acyclic; and 

3. n is HCF iffn' is HCF. 

This theorem states that the properties of being acyclic and of being HCF are in- 
variant with respect to any sequence of substitutions from (LI )— (L12) and (J). The next 
theorem demonstrates that substitutions (LI )— (L12) and (J) are sufficient to transform 
a given NLP into a corresponding DLP. 

Theorem 2. For any nested program 77, there is a program II' obtained from 77 via 
a sequence of substitutions from ( L1)—(L12 ) and (J) obeying the conditions from Theo- 
rem 1 and such that II' is a DLP. 

Example 3. For 1 1> from Example 1, we derive the following DLP by applying substi- 
tutions (LI 1) and (J) to both of the last two rules of 77 2 : 

77' = {p V q ; p V L q ; q V L p U {_L <— v A L v ; v <— ->L V \ v G {p, g}}. 

The dependency graph of this program is ({ p , q, L p , L q }, 0), and thus it is still HCF and 
acyclic. As well, the only stable model of 77' is {p, q] . 

4 Reductions to Classical Propositional Logic 

We now proceed with assigning a propositional semantics to acyclic and head-cycle 
free nested programs, in the sense that a program 77 is transformed into a propositional 
formula <j> such that the answer sets of 77 are given by the models of <b. Observing that 
these encodings yield propositional formulas whose sizes are. polynomial in the sizes of 
the input programs, we also draw some immediate complexity results. 

We have the following building blocks. Let 77 be a nested program. For any p G 
Atm + (II) occurring in a strongly connected component of size l > 1 in Gn- we 
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introduce globally new variables pi, . ■ ■ ,Pk , where k = [log 2 (7 — 1)] . For two atoms 
p, q occurring in the same component of size / > 1 of the dependency graph, we define 

k k k 

prec n [q,p\ = f\(q t -> \J Pj ) A f\(.Pi — > q,f 

i— 1 j=i i— 1 

Informally, prec n [- , ■] assigns a strict partial order to the atoms in II, based on a binary 
encoding technique. 

Now we are ready to define our two main transformations, T[-\ and T* [•], from 
nested logic programs into formulas of propositional logic. 

Definition 6. Let II be a nested program, and let I ! v be the program resulting from 
77 by taking those rules r £ II where p £ Atm + (H(r)) and replacing each positive 
occurrence of p in a head by _L. Furthermore, let II* be the program resulting from 
II p by replacing each positive occurrence of an atom q p in a body by the formula 

q A prec n [q,p], providing q is in the same component as p in Gn- 
Then, define 



T[n] — c(77) A A ( p — » ^c(TIp)) and 

p^Atm(n) 

[n] = c(77) a A (p^-c(ii;)). 

p^Atm(n) 

Note that both the size of T [II] as well as the size of T* [II] is polynomial in the 
size of 77. Furthermore, if 77 is an NnLP, then T[77] is equivalent to the completion 
COMP[77], and if 77 is an acyclic NLP, then T* [77] = T[77], Moreover, it can be shown 
that, for any DLP 77, the theories T[77] and T*[77] are equivalent to the encodings 
given by Ben-Eliyahu and Dechter [ 1 ] for acyclic and HCF DLPs, respectively. The 
main characterisations of [ 1 ] can thus be paraphrased as follows: 

Proposition 2 ([1]). For any DLP 77, 

7. if 77 is acyclic, then I £ AS(II) iff I £ Mod(T[II]), for all I C Atm(II); and 
2. if 77 is HCF, then AS(n) = Mod(T* [n])\At m (n)- 

The restriction in the second result is used to “hide” the newly introduced variables 
in formulas prec n [■ , •] in T* [77] . 

With the next results, we generalise Proposition 2 to HCF and acyclic nested pro- 
grams. To begin with, we have the following theorem. 

Theorem 3. Let 77 be a HCF nested logic program. 

Then, AS(n ) = Mod{T* [n])\ Atm{n y 

This theorem is proved by showing that models of T*[-] are invariant (modulo the 
introduction of new atoms) under substitution rules (L1)-(L12) and (J). 

As an immediate consequence, we obtain the following corollary for acyclic nested 
programs. 
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Corollary 1. Let II be an acyclic nested logic program, and let I C Atm(II). 

Then, I G AS(II) iff I G Mod(T\II]). 

Let us briefly mention that our encodings easily extend to typical reasoning tasks as- 
sociated to logic programs. Following [ 1], we define the following inference operators. 
Let 77 be a logic program and S a finite set of atoms. 

1. Brave consequence: 77 h;, S' iff S is contained in some answer set of 77. 

2. Skeptical consequence: 77 h s S iff S is contained in all answer sets of 77. 

3. Disjunctive entailment: 77 S iff, for each answer set 7 of 77, there is some p G S 
such that p G 7. 

We then obtain the following straightforward encodings: 

Theorem 4 . Let S be a finite set of atoms. 

1. For any acyclic NLP LI, we have that 

(a) 77 \- b S iffT[II] A f\ pe gP is satisfiable; 

( b ) 77 h s S iffT[II] — -> A P eS 7* ’ s va ^d; an d 

(c) 77 \- d S iffT[n] — > VpesP is va Hd. 

2. For any HCF NLP 77, we have that 

(a) II \~b S iffT* [77] A f\ p& gP is satisfiable; 

( b ) 77 h s S iffT* [n] — > /{pfzgP is valid; and 

(c) 77 \- d S iffT*[II ] — > \/ p&s pis valid. 

Observing that the above encodings are clearly constructible in polynomial time, 
we derive the following immediate complexity results: 

Theorem 5. Checking whether II h/, S holds, for a given acyclic or HCF NLP 77 and 
a given finite set S of atoms, is NP -complete. Furthermore, checking whether 77 \- s S 
or whether 77 S holds, given 77 and S as before, is co-NP -complete. 

Note that the upper complexity bounds follow from the complexity of classical 
propositional logic, and the lower complexity bounds are inherited from the complexity 
of normal logic programs. 



5 A Generalised Shifting Approach 

The result that HCF nested programs have NP or co-NP complexity motivates to seek 
a polynomial translation from HCF programs to NnLPs and furthermore to nLPs. We 
do this by introducing a generalised variant of the well-known shifting technique [1,6]. 
Recall that shifting for DLPs is defined as follows: Let r G 77 be a disjunctive rule in a 
HCF DLP 77. Then, following [1], 77 is equivalent to the program resulting from 77 by 
replacing r by the following set of rules 3 : 

{p <— B{r) A ~^{Atm{H{r)) \ {p}) \ p G Atm{H{r))}. (1) 

3 For a finite set S of atoms, ->S denotes A s gs _lS - 
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For generalising this shifting technique to nested programs, we introduce the substitu- 
tion rule (S), depicted in Table 2, which allows the replacement of <p V ip *— ip by the 
two rules </><—</? A ->ip and ip <— ip A -><p, where <p and ip are arbitrary expressions, 
providing Atm + (cp ) fl Atm + (ip) = 0. Observe that (S) preserves head-cycle freeness. 

In view of its proviso, (S) is not always applicable, even if a given program is HCF. 
But this problem is already apparent in the case of disjunctive programs. Indeed, in (1), 
we used the set Atm(H (r)) rather than the disjunction 77 (r) explicitly, otherwise we 
would have run into problems: For instance, the DLP {pVp <— } is clearly not equiva- 
lent to {p <— ->p} . 

We can establish the following property: 

Lemma 1. Let 77 be a HCF nested program, and let 77' be obtained from 77 via (S). 
Then, AS(n) = AS (II'). 

This lemma follows from the property that models of T* [•] are preserved under 
application of ( S), together with Theorem 3. 

Theorem 6. Let 77 be a nested program. 

Then, there is a program S exp [II] obtained from 77 via a finite sequence of substi- 
tutions from ( LI)— {LA ), ( L6)-(L8 ), (770), (777)^, (772), and ( S ), such that ( i ) S exp [n] 
is an NnLP, and (ii) if 77 is HCF, then AS (77) = AS ( S exp [77]). 

The “strategy” to obtain S exp [n] from 77 is as follows: First, we translate 77 into 
a program where all heads are disjunctions of atoms. Then, via (LI), (L2), and (L3), 
we can easily eliminate repeated occurrences of an atom p in a head. Finally, we then 
apply (S) to replace each (proper) disjunctive rule into a set of nested normal rules. 

Observe that the subscript “exp” in S exp [- ] indicates that the size of S exp [n ] may 
be exponential in the size of 77 in the worst case. The reason is the use of substitution 
rule (L4). We can circumvent the application of (L4), and thus the exponential blow-up, 
if we could use (S) more directly. To this end, we introduce the two substitution rules 
(D) and (T*), as given in Table 2. Observe that (T*) is a generalisation of an optimisation 
rule called (TAUT) due to Brass and Dix [3]. In fact, we want to apply (D) instead of 
(S), but (D) may introduce new head cycles according to its definition. In particular, this 
situation occurs whenever an atom occurs positively in both the body and the head of 
the considered rule. Hence, the strategy is then as follows: If (S) is not applicable, we 
first use (T* ) to eliminate all atoms which occur positively in both the body and the head 
of the considered rule. After applying (D), we are clearly allowed to apply (S) to the 
resulting rules of form (p V 7 <— p, since 7 is a new atom not occurring in (t>. In order 
to apply (S) after (D) and (T*), it is required that acyclicity and head-cycle freeness are 
invariant under application of (D) and (T*), which is indeed the case. Given that both 
substitutions can be shown to be answer-set preserving for HCF programs as well, we 
obtain the following theorem. 

Theorem 7. Let 77 be a nested program. 

Then, there is a program S po i y [77] obtained from 77 via a polynomial sequence of sub- 
stitutions from (L1)-(L3), ( L6)-(L8 ), (770), (777)^, (772), ( S ), ( T *), and ( D ) such that 
(i) S po iy [77] is an NnLP, and {ii) if 77 is HCF, then AS (77) = AS ( S po i y [77]) \Atm(n)- 
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Note that S po i y [77] is polynomial in the size of 77, since the distributivity rule (L4) 
is not included. Indeed, new atoms are only introduced by (D). 

So far, we showed how to translate HCF nested programs into NnLPs in polynomial 
time. In order to obtain a reduction to nLPs, we consider two additional rules, (Yl) and 
(Y2), depicted in Table 2. The following result holds: 

Propositions ([29]). Let 77 be an NnLP, and let LI' be obtained from 77 via (Yl) 
or (Y2). 

Then, AS(II) = AS (n')\ Atm(n) . 

Putting the previous results together, the following property can be shown: 
Theorem 8. Let 77 be a nested program. 

Then, there is a program <S[77] obtained from 77 via a polynomial sequence of substi- 
tutions from ( L1)-(L3 ), ( L6)-(L9 ), (770), (Lllfa, (772), (S), (T*), (£>), (Yl), and (Y2), 
such that (i) <S[77] is normal, and (ii) if 77 is HCF, then AS{n ) = AS (S[n])\^ tm ^ n y 

Example 4. Observe that program 772 from Example 1 can be translated into the nLP 

S[77] = {p <- ~^q\ q <- ->p\ p <- -7 i; L 1 <- ~^q\ q <- ^7 2 ; 7 2 <- ->p}. 

6 Tight Nested Logic Programs 

It is well known that every stable model of an NnLP 77 is a model of COMP [77] 
(cf., e.g., [11]). However, the converse holds only providing certain syntactic restric- 
tions are enforced. Such conditions were first given by Fages [12] for nLPs, and subse- 
quently extended by Erdem and Lifschitz [11] for NnLPs. In the latter work, the notion 
of tight nested normal logic programs is introduced. In this section, we extend tightness 
to general nested logic programs and show that HCF NLPs which satisfy tightness can 
be reduced to theories of classical propositional logic by means of translation T\-\. That 
is, the resultant theories are equivalent to COMP[77] in case of an NnLP 77. 

Following [11], we define the positive conjunctive components of an expression fa 
denoted cc(fa, as follows: First, every expression <j> can be written in the form <f> i A 
• • • A <j) n (n > 1), where each fa is not a conjunction. The formulas fa, ... ,<f> n are 
called the conjunctive components of f. Then, cc(fa) is the conjunction of all those 
conjunctive components of <j> such that at least one atom occurs positively in it. Note 
that, e.g., ccfap) = T, where p is some atom. 

Definition 7. A nested program 77 is tight on an interpretation 7 iff there exists a func- 
tion A from Atm(n) to ordinals such that, for each rule r G 77, if I G Mod{H{r) A 
7?(r)), then A (p) < A (q), for each p G Atm(cc(B(r))) and each q G Atm + (77(r)). 

Obviously, this definition generalises the one in [11]. Using our translation T[-], we 
can reformulate the main theorem in [ 1 1 ] as follows: 

Proposition 4 ([11]). Let 77 be an NnLP, and let I C At.m(n) be an interpretation 
such that 77 is tight on I. 

Then, I G AS(n) iff I G Mod (T [77]). 
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We generalise this proposition by showing that T[77] is applicable to tight HCF 
nested programs as well. To this end, we make partly use of the results discussed in the 
previous section showing how nested programs can be reduced to NnLPs. Note that, 
whenever such a translation simultaneously retains tightness and models of T\-\, we 
directly get the desired generalisation, according to Proposition 4. 

Lemma 2. Let II be a nested program , let I be an interpretation, and let II' be ob- 
tained from 77 via any substitution from ( L1)-(L8 ), ( L12 ), (LI 1)*~, or (S). 

Then, II' is tight on I whenever 77 is tight on I. 

Lemma 3. Let 77 be a nested program, and let II' be obtained from 77 via any substi- 
tution from ( L1)-(L12 ), (Lll)^~ , or (S). 

Then, Mod (T[77]) = Mod{T[n'}). 

Observe that not all substitution rules from Table 2 used in Theorem 6 to obtain 
NnLPs are included in Lemma 2. In fact, there is some problem with (L10). Consider 
the program 77 = {a <- 6;i) A c <- a}, which is tight on interpretation 7 = {a, 6}, 
since only for the first rule r = a <— b the condition 7 £ Mod(H(r ) A B(r)) from 
Definition 7 holds. Applying (L10), we obtain 77' = {a b] b a; c <— a} which 
is not tight on {a, b} anymore, because now, both 7 £ Mod(H(r) A B(r)) and 7 £ 
Mod(H(r') A B(r')) holds, for r = a <— b and r' = b *— a. We therefore replace 
(L10) by the new rule (C) from Table 2, which can be shown to retain tightness, models 
of T\-\ (modulo newly introduced atoms), and head-cycle freeness. 

By these invariance results, we get the main result of this section. 

Theorem 9. Let 77 be a HCF nested program, and let I C Atm (II) be an interpreta- 
tion such that 77 is tight on I. 

Then, I £ AS(n) iff I £ Mod(T[H\). 

7 Conclusion 

In this paper, we introduced the classes of acyclic and head-cycle free nested pro- 
grams as generalisations of similar classes originally introduced for disjunctive logic 
programs. We furthermore extended several results related to Clark’s completion to 
our classes of programs, by introducing the polynomial reductions T[-] and T* [■] to 
classical propositional logic. Moreover, we extended the notion of tightness to nested 
programs, and we constructed a polynomial translation of HCF nested programs into 
normal programs by applying a generalised shifting technique. We also derived imme- 
diate complexity results, showing that acyclic and HCF nested programs have a lower 
complexity than arbitrary NLPs, providing the polynomial hierarchy does not collapse. 

Transformations T\-\ and T*\-\ can also be viewed as optimisations of a translation 
studied in [26], in which (arbitrary) nested programs are efficiently mapped to quanti- 
fied Boolean formulas such that the stable models of the former are given by the models 
of the latter. Hence, the present results show that, in case of acyclic and HCF programs, 
a reduction to classical formulas suffices instead of a reduction to the more expressive 
class of quantified Boolean formulas. 
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The translation of acyclic and HCF nested programs to nLPs optimises a polyno- 
mial translation presented in [25] from arbitrary nested programs into disjunctive logic 
programs, in the sense that our current method (i) introduces in general fewer addi- 
tional variables, and (ii) translates a subclass of NLPs into a (presumably) less complex 
subclass of DLPs, viz. normal programs. 

Furthermore, our translation extends and optimises also a recent result due to Eiter 
et al. [8] which discusses a general method to eliminate disjunctions from a given DLP 
under different notions of equivalence. To wit, under ordinary equivalence (i.e., preser- 
vance of stable models), the aforementioned method allows to transform a given DLP 
into an nLP by applying the usual shifting technique [1 ] and by adding suitable rules 
in order to retain equivalence between the programs. However, in general, the size of 
the resultant programs is exponential in the size of the input programs. Hence, for HCF 
programs, we obtain not only a generalisation of this general result to the nested case 
but also a polynomial method to achieve a transformation to nLPs. 

Following the remarks in [29], our polynomial transformations from HCF nested 
programs into normal programs can be used to utilise extant answers-set solvers, like 
DLV [7], Smodels [27], or ASSAT [19], for computing answer sets of HCF nested 
programs. Furthermore, the present results indicate how to compute answer sets of HCF 
NLPs directly by generalising graph based methods as described in [4,20, 16]. More 
precisely, we may define Atm~(ip) as the set of atoms having negative occurrences in 
ip, which enables us to express positive as well as negative dependencies between atoms 
in expressions. Therefore, graph coloring techniques as described in [16, 21 ], and used 
as basis of the noMoRe system [22], may be generalised to HCF NLPs. Hence, our 
approach offers different ways for answer-set computation of nested programs. 

Although our current results are established for programs containing only one kind 
of negation, viz. default negation, they can be extended to programs allowing strong 
negation as well. Furthermore, another issue is the lifting of the notions of acyclic and 
head-cycle free nested programs to the first-order case, which can be done along the 
lines of [14]. 
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Abstract. Learning algorithms such as decision tree learners dynam- 
ically generate a huge amount of large queries. Because these queries 
are executed often, the trade-off between meta-calling and compiling & 
running them has been in favor of the latter, as compiled code is faster. 
This paper presents a technique named control flow compilation, which 
improves the compilation time of the queries by an order of magnitude 
without reducing the performance of executing the queries. We exploit 
the technique further by using it in a just-in-time manner. This improves 
performance in two ways: it opens the way to incremental compilation 
of the generated queries, and also gives potentially large gains by never 
compiling dynamically unreachable code. Both the implementation of 
(lazy) control flow compilation and its experimental evaluation in a real 
world application are reported on. 



1 Introduction 

In previous work, query packs [5] were introduced as an efficient method for 
executing a set of similar queries. A query pack is basically the body of a rule 
with no arguments, with a huge number of literals and disjunctions. The query 
pack execution mechanism deals with the disjunctions in a special way, namely 
by avoiding a branch which already succeeded before. Query packs are interesting 
in the context of several types of learners, including first order decision trees [4, 
8], first order pattern discovery [7], and rule-based learners [9-11]. These query 
packs can be executed in ilProlog [1], a WAM [17] based Prolog algorithm with 
special support for Inductive Logic Programming (ILP). They are generated 
dynamically by the ILP algorithm, compiled by the underlying Prolog system, 
after which the compiled code is executed on a dataset (which is actually a large 
collection of different logic programs). This is not an unreasonable approach, 
and we have indeed measured large speedups in ILP systems based on this 
approach [5]. 

However, measurements indicate that the compilation of a complete query 
pack is a very costly operation, and sometimes causes more overhead than what 
is gained later when executing the compiled version (instead of meta-calling the 
query pack). Also, some goals of a generated query pack fail on each example, 
meaning that the part of the query following that goal was compiled in vain. 
Therefore, we started investigating control flow compilation [14] as a more flex- 
ible and faster alternative to classical compilation. This is basically a hybrid 
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between compilation and meta-call. While classical WAM code for a compiled 
query contains both instructions encoding the calls to predicates and instruc- 
tions dealing with the control flow (e.g. selection of clauses/branches), control 
flow compilation only generates the control flow instructions and uses meta-call 
to deal with the calls. The resulting compilation is much less expensive, and the 
generated code is as fast as classical compiled code. Moreover, this technique 
allows a lazy compilation scheme, which only compiles a part of a query when 
it is actually executed. Not only does this avoid redundant compilation, lazy 
compilation is also a first step towards supporting the incremental generation 
of queries and query packs. Introducing laziness in the full WAM compiler is 
not straightforward, because its variable classification and allocation scheme is 
optimized towards the situation where all the code is known. Moreover, because 
query packs are very large, specialized techniques for dealing with its variables 
are needed [16], which complicates matters even further. On the other hand, the 
control flow compiler does not have to deal with the variables in the query packs, 
and can therefore compile an increment to a query almost independently of the 
previous query. 

In this paper, we present control flow compilation and its lazy variant as an 
innovative way to deal with compilation overhead and to achieve faster execution 
of queries. We illustrate its advantages with real life examples. Lazy control flow 
compilation is also an enabling technology for incrementality in the ILP process 
of query (pack) generation and execution. In principle, any application depending 
on an efficient meta-call could benefit from this technique. Nevertheless, the focus 
of this work is on ILP. 

Control flow compilation is described and evaluated in Section 3. Based on 
control flow compilation, we develop a lazy compilation scheme for queries con- 
taining conjunctions and disjunctions in Section 4. (Lazy) control flow compila- 
tion is extended to query packs in Section 5. We evaluate our approaches using 
both artificial and real world experiments. Finally, Section 6 concludes and dis- 
cusses future work. 

We assume that the reader is familiar with the WAM [2]. 

2 Background: Queries in ILP 

We start by sketching a particular setting in which our work is relevant, namely 
the execution of queries in Inductive Logic Programming. The goal of Inductive 
Logic Programming is to find a theory that best explains a large set of data (or 
examples). In the ILP setting at hand, each example is a logic program, and 
the logical theory is represented as a set of logical queries. The ILP algorithm 
searches for these queries using generate-and-test: generated queries are run on 
sets of examples; based on the failure or success of these queries, only the ones 
with the ‘best’ results 1 are kept and are extended (e.g. by adding literals). These 

1 Which queries are best depends on the ILP algorithm. In the case of classification, 
the information gain can be used as a criterium, whereas in the case of regression, 
the reduction of variance is often used. 
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extended queries are in turn tested on each example, and this process continues 
until a satisfactory query (or set of queries) describing the examples has been 
found. 

At each iteration of the algorithm, a set of queries is executed on a large set 
of logic programs (the examples). Since these queries are the result of adding 
different literals to the end of another query, the queries in this set have a lot of 
common prefixes. To avoid repeating the common parts by executing each query 
separately, the set of queries is transformed into a special kind of disjunction: a 
query pack [5]. For example, the set of queries 

?- a, b, c, d. 

?- a, b, c, e. 

?- a, b, f, g. 

is transformed into the query 

?- a, b, ( (c, (d;e)) ; f ,g ) . 

by applying left factoring on the initial set of queries. However, because only the 
success of a query on an example is measured, the normal Prolog disjunction 
might still cause too much backtracking. So, for efficiency reasons the ’;’/2 is 
given a slightly different semantics in query packs: it cuts away branches from 
the disjunction as soon as they succeed. Since each query pack is run on a large 
set of examples, a query pack is first compiled, and the compiled code is executed 
on the examples. This compiled code makes use of dedicated WAM instructions 
for the query pack execution mechanism. More details can be found in [5]. 

3 Control Flow Compilation 

3. 1 Technology 

Executing compiled queries instead of meta-calling them results in considerable 
speedups. However, compilation of a query can take as much time as the ex- 
ecution of the query on all examples. Moreover, classical compilation makes it 
very difficult to exploit the incremental nature of query generation in the ILP 
setting. It would require a tight coupling between the generation of the queries 
and their compilation. Also, assignment of variables to environment slots uses a 
classification of variables which assumes that all the code is known at compile 
time. This motivated the preliminary study of alternatives for compile & run in 
[14] . The most interesting alternative is control flow compilation, which is a hy- 
brid between meta-calling and compiling a query. In this section, we introduce 
control flow compilation for queries whose bodies consist of conjunctions and 
disjunctions. Control flow compilation for query packs is discussed in Section 5. 

The essential difference between classical compilation and control flow com- 
pilation is the sequence of instructions generated for setting up and calling a 
goal. Instead of generating the usual WAM put and call instructions, the latter 
generates one new cf_call instruction, whose argument points to a heap data 
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structure (the goal) that is meta-called. Hence, control flow code only contains 
the control flow instructions (try, retry, . . . ) and cf _call (and cf _deallex) 
instructions. 

For example, control flow compiling the query 

q a(X,Y) , ( b(Y,Z) ; c(Y,Z), d(Z,U); e(a,Y) ). 

results in the code in the left part of Figure 1. Note that the query itself is a term 
on the heap, and that we use &a(X,Y) to represent the pointer to its subterm 
a(X,Y). On the right of Figure 1 is the classical compiled code for the same 
query. Before calling each goal, the compiled code first sets up the arguments to 

q a(X, Y) , ( b(Y,Z) ; c(Y,Z), d(Z,U); e(a,Y) ). 



Control flow code 


Compiled code 


allocate 2 


allocate 4 
bldtvar A1 
putpvar Y2 A2 


cf.call &a(X, Y) 


call a/2 


trymeorelse LI 


trymeorelse LI 
putpval Y2 A1 
bldtvar A2 


cf_deallex &b(Y,Z) 


deallex b/2 


retrymeorelse L2 


retrymeorelse L2 
putpval Y2 A1 
putpvar Y3 A2 


cf_call &c(Y,Z) 


call c/2 
putpval Y3 A1 
bldtvar A2 


cf_deallex &d(Z,U) 


deallex d/2 


trustmeorelsef ail 


trustmeorelsef ail 
putpval Y2 A2 
put_atom A1 a 


cf_deallex &e(a,Y) 


deallex e/2 



Fig. 1. Control flow compiled code vs. classical compiled code. 



the goal, whereas the control flow compiled code uses a reference to the subterm 
of the query to indicate the goal that is called. One important aspect is that 
the control flow code saves emulator cycles, because it contains no instructions 
related to the arguments of the goals that are called. Moreover, the absence of 
this kind of instructions is very interesting for the lazy compilation we have in 
mind. Suppose that we want to extend a query by adding a disjunction after 
its last call (e.g. refining e(a,Y) into e(a,Y),(f(Y,Z);g(Y,U),h(U,V))); within the 
control flow compilation scheme, it is possible to extend the existing code just by 
adding more control flow instructions at the end, without the usual compilation 
issues concerning the variables. 

Contrary to compiled code, control flow code cannot exist on its own, since it 
contains external references to terms on the heap. This introduces some memory 
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management issues: (1) these terms have to be kept alive as long as the control 
flow compiled code exists; (2) when these terms are moved to another place 
in memory (e.g. by the garbage collector), the references in the code must be 
adapted as well. 

3.2 Evaluation 

For evaluating our approach, we added control flow compilation to the ilProlog 
system [1], During the experiments, the heap garbage collector was deactivated, 
as it does not yet take into account the control flow code. The experiments were 
run on a Pentium III 1.1 GHz with 2 GB main memory running Linux under a 
normal load. 

Two kinds of experiments are discussed: the benchmarks in Table 1 show the 
potential gain in an artificial setting, whereas the results in Table 2 are obtained 
from a real world application. 



Table 1 . Experiments for artificial disjunctions, (timings in milliseconds). 





(5,5,4) 
comp exec 


(10,5,4) 
comp exec 


(5,10,4) 
comp exec 


(10,10,4) 
comp exec 


(5,5,6) 
comp exec 


control flow 
compile & run 
meta-call 


25 0.13 
322 0.28 
- 2.1 


52 0.25 
663 0.48 
- 3.79 


390 4.07 

4676 5.49 

- 31.73 


735 7.73 

11856 9.18 

- 58.83 


682 7.03 

11099 9.32 

- 58.43 



The artificially generated queries in Table 1 have the following parameters: 

— g: the number of goals in a branch, 

— b: the branching factor in a disjunction, 

— d: the nesting depth of disjunctions. 

For example, for the values (2,3,1) for (g,b,d) we generate the query a(A,B,C), 
a(C,D,E), (a(E,F,G), a(G,H,I); a(E,J,K), a(K,L,M); a(E,N,0), a(0,P,Q)). For 
(1,2,2), the generated query has nested disjunctions: a(A,B,C), ( a(C,D,E), 
(a(E,F,G) ; a(E,H,I)) ; a(C,J,K), (a(K,L,M) ; a(K,N,0))). The definition for 
a/3 is simply a These queries have the same structure as query packs: dis- 

junctions obtained from left factoring a set of conjunctions. The different values 
of ( g,b,d ) can be found in the upper row of the table. We report on the following 
three alternatives: 

— control flow: the query is compiled using the control flow approach before it 
is executed. 

— compile & run: the query is compiled using the classical WAM before it is 
executed. 

— meta-call: the query is meta-called (no compilation at all). 

The comp column gives the compilation time, while the exec column gives the 
execution time of a single execution of a query. 
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The control flow compilation is definitely better than compile & run: the 
compilation times are improved by one order of magnitude, while the execution 
times are also better. The compilation in the control flow approach is much faster 
because it does not need to perform expensive tasks such as assigning variables to 
environment slots. The better execution times are explained by the fact that only 
one emulation cycle per call is needed as no arguments have to be put in registers. 
Doubling the g parameter more or less doubles the timings. For larger queries, 
namely for (10,10,4) and (5,5,6), control flow compilation becomes a factor 15 
faster than compile & run. If the query is executed a number of times, meta-call 
is outperformed by control flow compilation (e.g. for (5,5,4), this number is 13). 
Since in ILP, each query is run on a significant number of examples, these results 
are very promising. 

Table 2. Experiments for conjunctions from a real world application. 





ACE:muta 


ACE:bongard 


ACE:carcino 




Timings (seconds) 




comp exec 


comp exec 


comp exec 


control flow 


0.11 0.17 


0.7 19.88 


2.91 46.52 


compile & run 


0.24 0.24 


3.13 19.46 


16.81 44.45 


meta-call 


- 0.26 


- 22.41 


- 83.74 




Benchmark Characteristics 


number of queries 


2021 


9335 


48399 


average runs/query 


69.51 


244.77 


103.07 



The real world experiment consists in running the Tilde algorithm [4] from 
the ILP system ACE [1] on three well-known datasets from the ILP community: 
Mutagenesis [13], Bongard [6] and Carcinogenesis [12]. During the execution of 
Tilde, queries are subsequently generated, and every query needs to be run on 
a subset of the examples. These queries contain only conjunctions; disjunctions 
are dealt with as query packs in Section 5. Table 2 compares again control 
flow compilation with compile & run and meta-call. Times are given in seconds. 
For each data set, comp gives the total compilation time (namely the time for 
compiling all the queries generated by Tilde) and exec the total execution time 
(namely the time to execute all the (compiled) queries). For each dataset, the 
lower part of Table 2 also gives the number of queries generated and the average 
number of runs per query. 

In the Tilde runs, control flow compilation gains a factor 2 to 6 with respect 
to usual compilation. Control flow compiled code outperforms classical compiled 
code for the Mutagenesis dataset, but is about 5% slower for Carcinogenesis 
(which is still acceptable). When we consider the total time (namely comp + 
exec), control flow compilation is clearly the best alternative out of the three 
for Carcinogenesis. For Bongard, control flow compilation is slightly faster than 
the other two, which are comparable. Because Mutagenesis has relatively small 
queries which are run infrequently, meta-call performs best for this dataset. 
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The results are more pronounced for the artificial benchmarks than for the 
Tilde ones for several reasons. The artificial queries are longer than the typical 
Tilde queries; making the artificial queries shorter makes the timings unreliable. 
During the artificial benchmarks, the time spent in the called goals is very small 
(only proceed), whereas in the Tilde experiments much more time is spent in 
the predicates, and as such the effect of control flow on the exec timing decreases. 
Another observation is that control flow code uses pointers to the heap, and as 
the heap garbage collection is currently deactivated, the heap contains all the 
queries ever generated. This is bad for locality: we have indeed observed that 
locality can have a large impact on the execution time in the case of control flow 
compilation. We expect that as soon as the heap garbage collector is adapted 
and is activated again, the execution times will improve. This line of reasoning is 
compatible with the fact that the number of queries in Mutagenesis is relatively 
small, such that locality is better and thus the control flow exec timing is better 
than for normal compilation. Finally, it is important to note that, while meta-call 
outperforms the other approaches for one of the datasets, its speedup will have 
to be sacrificed when we want to benefit from removing branches that already 
succeeded in the query packs approach. 

The main goal of control flow compilation was to have a flexible scheme for 
introducing lazy compilation for query packs, without slowing down execution 
itself. Our experiments prove that control flow compilation achieves this goal: if 
the execution times are slower, it is within an acceptable range of 5%, and in all 
our benchmarks the loss is compensated by the order of magnitude that can be 
gained for the compilation. 

4 Lazy Control Flow Compilation 

4. 1 Technology 

In [3], lazy compilation is identified as a kind of just-in-time (JIT) compilation or 
dynamic compilation , which is characterized as translation which occurs after a 
program begins execution. In this paper, we present lazy variants of control flow 
compilation. The requirement in [3] that the compiler used for JIT compilation 
should be fast enough is satisfied by our control flow compiler. Our lazy variant 
implicitly calls the control flow compiler when execution reaches a part of the 
query that is not yet compiled. As before, we restrict the discussion in this section 
to queries with conjunctions and disjunctions; the extension to query packs is 
presented in Section 5. 

As with normal control flow compilation, the query is represented by a term 
on the heap. We introduce a new WAM instruction lazy_compile, whose argu- 
ment is a pointer to the term on the heap that needs compiling when execution 
reaches this instruction. 

Consider the query q :- a(X,Y), b(Y,Z). The initial lazy compiled version of 

q is 



allocate 2 

lazy_compile &(a(X,Y) ,b(Y,Z)) 
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The lazy_compile instruction points to a conjunction: its execution replaces 
itself by the compiled code for the first conjunct, namely a cf_call, and adds 
for the second conjunct another lazy_compile instruction, resulting in: 

allocate 2 
cf _call &a(X,Y) 
lazy_compile &b(Y,Z) 

The execution continues with the newly generated cf_call instruction as is 
expected. After the next execution of lazy_compile, the compiled code is equal 
to code generated without laziness: 

allocate 2 
cf _call &a(X,Y) 
cf_deallex &b(Y,Z) 

Note that lazy compilation overwrites the lazy_compile instruction with a cf_ 
instruction, and that once we have executed the query for the first time com- 
pletely, the resulting code is the same as the code produced by non-lazy control 
flow compilation. 

Now, consider the lazy compilation of the query from Figure 1: 

q :- a(X,Y) , ( b(Y,Z) ; c(Y,Z), d(Z,U); e(a,Y) ). 

Initially, the code is 
allocate 2 

lazy_compile &(a(X,Y) , (b(Y,Z) ;c(Y,Z) ,d(Z,U) ;e(a,Y))) 

The lazy_compile changes the code to: 

allocate 2 
cf _call &a(X,Y) 

lazy_compile &(b(Y,Z) ;c(Y,Z) ,d(Z,U) ;e(a,Y)) 

Now, lazy_compile will compile a disjunction. Where normal (control flow) 
compilation would generate a trymeorelse instruction, we generate a lazy vari- 
ant of this. The lazy_trymeorelse instruction has as its argument the second 
part of the disjunction, which will be compiled upon failure of the first branch. 
The instruction is immediately followed by the code of the first branch, which 
is initially again a lazy_compile: 

allocate 2 
cf _call &a(X,Y) 

lazy_trymeorelse &(c(Y,Z) ,d(Z,U) ;e(a,Y)) 
lazy_compile &b(Y,Z) 

Execution continues with the lazy_trymeorelse: a special choice point is created 
such that on backtracking the remaining branches of the disjunction will be 
compiled in a lazy way. To achieve this, the failure continuation of the choice 
point is set to a new lazy_dis j_compile instruction, which behaves similarly to 
lazy_compile. Then, execution continues with the first branch: 
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allocate 2 
cf _call &a(X,Y) 

lazy_trymeorelse &(c(Y,Z) ,d(Z,U) ;e(a,Y)) 
cf_deallex &b(Y,Z) 

Upon backtracking to the special choice point created in lazy.trymeorelse, 
the lazy_disj .compile instruction continues compilation, and replaces the cor- 
responding lazy.trymeorelse by a trymeorelse instruction with as argument 
the address of the code to be generated: 

allocate 2 
cf .call &a(X,Y) 
trymeorelse LI 
cf.deallex &b(Y,Z) 

LI: lazy.retrymeorelse &(e(a,Y)) 
lazy.compile &(c(Y,Z) ,d(Z,U)) 

Here, lazy_retrymeorelse the lazy variant of retrymeorelse behaves sim- 
ilar to lazy.trymeorelse, but instead of creating a special choice point, it alters 
the existing choice point. It is immediately followed by the code of the next part 
of the disjunction, which after execution looks as follows: 

allocate 2 
cf .call &a(X,Y) 
trymeorelse LI 
cf.deallex &b(Y,Z) 

LI: lazy.retrymeorelse &(e(a,Y)) 
cf.call &c(Y,Z) 
cf.deallex &d(Z,U) 

Upon backtracking, lazy_retrymorelse is overwritten, and a trustmeorelse 
is generated for the last branch of the disjunction, followed by a lazy.compile 
for this branch: 

allocate 2 
cf.call &a(X,Y) 
trymeorelse LI 
cf.deallex &b(Y,Z) 

LI: retrymeorelse L2 
cf.call &c(Y,Z) 
cf.deallex &d(Z,U) 

L2: trustmeorelsef ail 

lazy.compile &e(a,Y) 

After the execution of the last branch, we end up with the full control flow code. 

The lazy compilation as we described it proceeds from goal to goal. Other 
granularities have been implemented and evaluated as well (see Table 3): 
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— Per conjunction: All the goals in a conjunction are compiled at once. This 
avoids constant switching between the compiler and the execution by com- 
piling bigger chunks. 

— Per disjunction: All the branches of a disjunction are compiled at once up 
to the point where a new disjunction occurs. This approach is reasonable, 
because all branches of a disjunction will by tried (and thus compiled) even- 
tually. 

Besides the overhead of switching between compiler and execution, these 
approaches might also generate different code depending on the execution itself. 
When a goal inside a disjunction fails, the next branch of the conjunction is 
executed, and newly compiled code is inserted at the end of the existing code. 
When in a later stage the same goal succeeds, the rest of the branch is compiled 
and added to the end of the code, and a jump to the new code is generated. 
These jumps cost extra emulator cycles and decrease locality of the code. Lazy 
compilation per goal can in the worst case have as many jumps as there are goals 
in the disjunctions. Compiling per conjunction can have as many jumps as there 
are disjunctions. If a disjunction is completely compiled in one step, each branch 
of the disjunction ends in a jump to the next disjunction. 

4.2 Evaluation 

The experiments of Table 3 use some of the artificial benchmarks from Table 1. 
Timings (in milliseconds) are given for the different settings of the lazy com- 
pilation. The timings report the time needed for one execution of the query, 
thus including the time of its lazy compilation. The last line gives the times for 
the non-lazy control flow compilation 2 . Lazy compilation per goal clearly has 
a substantial overhead, whereas the other settings have a small overhead. We 
also measured the execution times for the three lazy alternatives once they are 
compiled: they were all equal, and are therefore not included in the table. 



Table 3. Lazy compilation for several kinds of disjunctions, (timings in milliseconds). 





(5,5,4) 

cexec 


(10,5,4) 

cexec 


per goal 


55 


in 


per conj 


34 


60 


per disj 


32 


59 


control flow 


28 


59 



The main message here is that the introduction of laziness in the control flow 
compilation does not degrade performance much, and that it opens perspectives 

2 Note that these timings are slightly higher than the sum of comp and exec in Ta- 
ble 1. This is probably due to the fact that both experiments are run in different 
circumstances with different locality. 
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for query packs compilation: (1) lazy compilation is fast; (2) in non-artificial 
benchmarks, some branches will never have to be compiled due to failure of 
goals, whereas in our artificial setting all goals in the queries succeed; (3) in 
the long run, it allows incremental compilation: if we would allow open ended 
queries (queries that end with an uninstantiated call), the ILP system can refine 
the query later by further instantiating the open end, and lazy compilation will 
automatically compile the new part of the query when it is reached. 

5 Lazy Control Flow Compilation for Query Packs 

5.1 Technology 

So far, we restricted our (lazy) control flow compilation approach to queries 
containing conjunctions and ‘ordinary’ disjunctions. However, the main motiva- 
tion for this work was optimizing the execution of query packs [5]. These query 
packs represent a set of (similar) queries which are to be executed, laid out in 
a disjunction. The semantics of this disjunction is implemented by dedicated 
WAM instructions [5], as explained in Section 2. These instructions replace the 
instructions generated for encoding ordinary disjunctions. 

Extending control flow compilation to handle these query packs is rather 
straightforward. The difference between the compilation of disjunctions handled 
so far and the disjunctions of a query pack is that the dedicated WAM in- 
structions have to be generated as control flow instructions for the disjunctions. 
Introducing laziness in control flow compilation for query packs requires more 
changes. Originally, query packs used static data structures which were allocated 
once, since all the information on the size and contents of these data structures 
was known at compile time. However, when laziness is introduced, only parts of 
the query pack are analyzed, and so the data structures need to be dynamic and 
expandable. 

To facilitate the implementation of lazy control flow compilation for query 
packs, we chose to implement only one of the lazy variants described in Section 
4. Since the experiments showed little difference between all the variants (except 
for lazy compilation per goal), this seems like a reasonable decision. We chose 
to compile one complete disjunction at a time, because this makes integration 
with the existing query pack data structures easier. 

5.2 Experiments 

The experiments are again performed with the real world applications from 
Table 2. Instead of a set of queries (the conjunctions of Table 2), Tilde now 
generates query packs. These query packs are then compiled and finally executed 
for a subset of the examples. The use of query packs allows us to set up a larger 
experiment (in ILP terms: we now use a lookahead of 3 instead of 2), which 
results in more and longer queries. 

The timings in Table 4 are in seconds: for compile & run and control flow, 
we give the sum of the total compilation time and the total execution time; for 
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Table 4. Experiments for query packs from a real world application. 





ACE:muta 


ACE:bongard 


ACE:carcino 






Timings (seconds) 








(comp + exec = cexec) 


control flow 


0.13+0.08 = 0.21 


1.02 +23.75 = 24.77 


2.92+7.07 = 9.99 


lazy control flow 


0.24 


24.0 


8.15 


compile & run 


0.69+0.11 = 0.80 


12.27+22.48 = 34.75 


47.97+5.24 = 53.21 




Query Pack Characteristics 


Nb. of packs 


50 


4 


28 


Nb. of queries 


6010 


63668 


204527 


Avg. runs/pack 


61.52 


723.50 


134.67 




Code size 


reduction with lazy compilation 


Reduction 


17.0 % 


57.2 % 


61.4 % 



lazy control flow compilation, no distinction can be made, and so the total time 
for compilation and execution is given. 

First, we compare control flow compilation with compile & run. For query 
packs, control flow compilation is also up to an order of magnitude faster than 
classical compilation, even though the ilProlog system already has a compiler 
that is optimized for dealing with large disjunctions [16] (in particular for the 
classification of variables in query packs). The execution times show the same 
characteristics as in the experiments with the conjunctions in Table 2: control 
flow has a faster execution in the case of Mutagenesis, whereas in the other 
two cases it is a bit slower. For the ILP application, the total time must be 
considered: the total time of control flow is up to a factor 4 faster than compile 
& run. 

Next, Table 3 shows that lazy compilation has some overhead, but we hoped 
that it would be compensated by avoiding the compilation of failing parts in 
the query packs. For Bongard and Carcinogenesis, lazy control flow timings are 
indeed better than for the plain control flow. The information about the code 
size reduction in the case of the lazy variant confirms the idea that we gain by 
avoiding the compilation of parts of the query packs that fail for all the examples. 
Also, the locality is better when less code is generated. Mutagenesis is a smaller 
benchmark with less code reduction, and so the compilation/execution ratio is 
large. This explains why the overhead of interleaved compilation with execution 
is not compensated for. 

The resulting timings confirm that lazy control flow compilation is the best 
approach for query packs. 

6 Conclusion and Future Work 

This paper presents a new method for faster compilation and execution of dy- 
namically generated queries: control flow compilation is up to an order of magni- 
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tude faster than classical compilation, while the execution times are similar. To 
our knowledge, this is also the first time that lazy compilation (as an instance 
of just-in-time compilation [3]) is used in the context of logic programming, in 
particular for queries. 

The benefits of control flow compilation versus classical compilation are clear 
and are confirmed in the context of real world applications from the ILP commu- 
nity. For larger benchmarks, the lazy variant gives the best results in combination 
with query packs. 

For control flow compilation itself, the main future work will consist in ex- 
tending the garbage collector of the ilProlog system to support control flow 
compiled code. This extension can be realized within the current garbage col- 
lector, and mainly requires a coding effort. We expect that garbage collection 
improves the locality and the execution times of queries. It also has to be inves- 
tigated whether it is interesting to put the control flow code on the heap, thus 
making code garbage collection of queries a part of the heap garbage collection 
process. 

We also plan to adapt (lazy) control flow compilation to extensions of query 
packs reported in [15]. We expect control flow compilation to yield the same 
speedups for these execution mechanisms as for query packs. However, the im- 
pact of laziness needs to be investigated. 

Finally, this work allows us to investigate how incremental generation and 
compilation of queries can be supported in an ILP system. 



Acknowledgments 

Remko Trongon is supported by the Institute for the Promotion of Innovation by 
Science and Technology in Flanders (I.W.T.). This work is partially supported 
by the GOA ‘Inductive Knowledge Bases’. We are indebted to Bart Demoen for 
his significant contributions to the achievements presented in this paper. 



References 

1. The ACE data mining system. http://www.cs.kuleuven.ac.be/~dtai/ACE/. 

2. H. Ait-Kaci. The WAM: a (real) tutorial. Technical Report 5, DEC Paris Research 
Report, 1990. See also: http://www.isg.sfu.ca/~hak/documents/wam.html. 

3. J. Aycock. A brief history of just-in-time. ACM Computing Surveys, 35(2):97-113, 
2003. 

4. H. Blockeel and L. De Raedt. Top-down induction of first order logical decision 
trees. Artificial Intelligence, 101(l-2):285-297, June 1998. 

5. H. Blockeel, L. Dehaspe, B. Demoen, G. Janssens, J. Ramon, and H. Vandecasteele. 
Improving the efficiency of Inductive Logic Programming through the use of query 
packs. Journal of Artificial Intelligence, 16:135-166, 2002. 

6. L. De Raedt and W. Van Laer. Inductive constraint logic. In K. P. Jantke, T. Shino- 
liara, and T. Zeugmann, editors, Proceedings of the Sixth International Workshop 
on Algorithmic Learning Theory, volume 997 of Lecture Notes in Artificial Intelli- 
gence, pages 80-94. Springer- Verlag, 1995. 




Fast Query Evaluation with (Lazy) Control Flow Compilation 253 



7. L. Dchaspe and H. Toivonen. Discovery of frequent datalog patterns. Data Mining 
and Knowledge Discovery, 3(1): T— 36, 1999. 

8. S. Kramer. Structural regression trees. In Proceedings of the Thirteenth National 
Conference on Artificial Intelligence , pages 812-819, Cambridge/Menlo Park, 1996. 
AAAI Press/MIT Press. 

9. S. Muggleton. Inverse entailment and Progol. New Generation Computing, Special 
issue on Inductive Logic Programming, 13(3-4) :245-286, 1995. 

10. J. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239- 
266, 1990. 

11. A. Srinivasan. The Aleph manual, http://web.comlab.ox.ac.uk/oucl/research/ ar- 
eas/mach learn/ Aleph/. 

12. A. Srinivasan, R. King, and D. Bristol. An assessment of ILP-assisted models for 
toxicology and the PTE-3 experiment. In Proceedings of the Ninth International 
Workshop on Inductive Logic Programming, volume 1634 of Lecture Notes in Ar- 
tificial Intelligence, pages 291-302. Springer- Verlag, 1999. 

13. A. Srinivasan, S. Muggleton, M. Sternberg, and R. King. Theories for muta- 
genicity: A study in first-order and feature-based induction. Artificial Intelligence, 
85(l,2):277-299, 1996. 

14. R. Trongon, G. Janssens, and B. Demoen. Alternatives for compile & run in the 
WAM. In Proceedings of CICLOPS 2003: Colloquium on Implementation of Con- 
straint and LOgic Programming Systems, pages 45-58. University of Porto, 2003. 
Technical Report DCC-2003-05, DCC - FC & LIACC, University of Porto, Decem- 
ber 2003. http:/ /www.cs.kuleuven.ac.be/cgi-bin-dtai/publ_info.pl?id=41065. 

15. R. Trongon, H. Vandecasteele, J. Struyf, B. Demoen, and G. Janssens. Query 
optimization: Combining query packs and the once-tranformation. In Inductive 
Logic Programming, 13th International Conference, ILP 2003, Szeged, Hungary, 
Short Presentations, pages 105-115, 2003. http://www.cs.kuleuven.ac.be/cgi-bin- 
dtai/publ_info.pl?id=40938. 

16. H. Vandecasteele, B. Demoen, and G. Janssens. Compiling large disjunctions. In 
I. de Castro Dutra, E. Pontelli, and V. S. Costa, editors, First International Con- 
ference on Computational Logic : Workshop on Parallelism and Implementation 
Technology for (Constraint) Logic Programming Languages, pages 103-121. Impe- 
rial College, 2000. http://www.es. kuleuven.ac.be/cgi-bin-dtai/publ_info.pl?id=32065. 

17. D. H. D. Warren. An abstract Prolog instruction set. Technical Report 309, SRI, 
1983. 




Speculative Computations 
in Or-Parallel Tabled Logic Programs 



Ricardo Rocha 1 , Fernando Silva 1 , and Vitor Santos Costa 2 

1 DCC-FC & LIACC 
University of Porto, Portugal 
{ricroc ,fds}@ncc .up.pt 
2 COPPE Systems & LIACC 
Federal University of Rio de Janeiro, Brazil 
vitor@cos.ufrj .br 



Abstract. Pruning operators, such as cut, are important to develop ef- 
ficient logic programs as they allow programmers to reduce the search 
space and thus discard unnecessary computations. For parallel systems, 
the presence of pruning operators introduces the problem of speculative 
computations. A computation is named speculative if it can be pruned 
during parallel evaluation, therefore resulting in wasted effort when com- 
pared to sequential execution. In this work we discuss the problems be- 
hind the management of speculative computations in or-parallel tabled 
logic programs. In parallel tabling, not only the answers found for the 
query goal may not be valid, but also answers found for tabled predicates 
may be invalidated. The problem here is even more serious because to 
achieve an efficient implementation it is required to have the set of valid 
tabled answers released as soon as possible. To deal with this, we propose 
a strategy to deliver tabled answers as soon as it is found that they are 
safe from being pruned, and present its implementation in the OPTYap 
parallel tabling system. 



1 Introduction 

Logic programming is a programming paradigm based on Horn Clause Logic, a 
subset of First Order Logic. Given a theory (or program) and a query, execution 
of logic programs uses a simple theorem prover that performs refutation in order 
to search for alternative ways to satisfy the query. Prolog implements a refutation 
strategy called SLD resolution. Further, subgoals in a query are always solved 
from left to right, and that clauses that match a subgoal are always applied in 
the textual order as they appear in the program. 

In order to make Prolog an useful programming language, Prolog designers 
were forced to introduce features not found within First Order Logic. One such 
feature is the cut operator. The cut operator adds a limited form of control to the 
execution by pruning alternatives from the computation. Cut is an asymmetric 
pruning operator because it only prunes alternatives to the right. Some Prolog 
systems also implement symmetric pruning operators, with a generic name of 
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commit. In practice, pruning operators are almost always required when devel- 
oping actual programs, because they allow programmers to reduce the search 
space and thus discard unnecessary computation. 

Because their semantics are purely operational, pruning operators cause dif- 
ficulties when considering alternative execution strategies for logic programs. 
The implementation of or-parallel systems is one example [1-4]. Namely, it has 
been observed that the presence of pruning operators during parallel execution 
introduces the problem of speculative computations. Ciepielewski defines spec- 
ulative computations as work which would not be done in a system with one 
processor [5]. Alternatives picked for parallel execution, may later be pruned 
away by a cut. Earlier execution of such computations results in wasted effort 
when compared to sequential execution. 

Pruning operators also raise questions in the context of tabling based ex- 
ecution models for Prolog. The basic idea behind tabling is straightforward: 
programs are evaluated by storing newly found answers of current subgoals in 
an appropriate data space, called the table space. New calls to a predicate check 
this table to verify whether they are repeated. If they are, answers are recalled 
from the table instead of the call being re-evaluated against the program clauses. 

We can consider two types of cut operations in a tabling environment: cuts 
that do not prune alternatives in tabled predicates - inner cut operations, and 
cuts that prune alternatives in tabled predicates - outer cut operations. Inner 
cuts cau be easily implemented in sequential systems. On the other hand, be- 
cause tabling intrinsically changes the left-to-right semantics of Prolog, outer 
cuts present major difficulties, both in terms of semantics and of implementa- 
tion. 

In this work we address the problem of how to do inner pruning on systems 
that combine tabling with or-parallelism. Our interest stems from our work in 
the OPTYap system [6], to our knowledge the first available system that can 
exploit parallelism from tabled programs. Our experience has shown that many 
applications do require support for inner pruning. In contrast, outer pruning is 
not widely used in current tabling systems. Unfortunately, new problems arise 
even when performing inner pruning in parallel systems. Namely, speculative 
answers found for tabled predicates may later be invalidated. In the worst case, 
tabling such speculative answers may allow them to be consumed elsewhere in 
the tree, generating in turn more speculative computation and eventually cause 
wrong answers to occur. Answers for tabled predicates can only be tabled when 
they are safe from being pruned. On the other hand, finding and consuming an- 
swers is the natural way to get a tabled computation going forward. Delaying 
the consumption of valid answers too much may compromise such flow. There- 
fore, tabled answers should be released as soon as it is found that they are not 
speculative. 

The main contribution of this paper is a design that allows the correct and 
efficient implementation of inner pruning in an or-parallel tabling system. To do 
so, we generalise Ali and Karlsson cut scheme [3], which prunes useless work as 
early as possible, to tabling systems. Our design allows speculative answers to 
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be stored in advance into the table, but its availability is delayed. Answers will 
only be made available when proved to be not speculative. 

The remainder of the paper is organised as follows. First, we discuss specula- 
tive computations in or-parallel systems and introduce the cut scheme currently 
implemented in OPTYap. Next, we discuss the problems arising with specula- 
tive tabled computations. Initially, we introduce the basic tabling definitions and 
the inner and outer cuts operations. After that, we present the support actually 
implemented in OPTYap to deal with speculative tabled computations. We end 
by outlining some conclusions. 

2 Cut Within the Or-Parallel Environment 

Cut is a system built-in predicate that is represented by the symbol “!”. Its 
execution results in pruning all the alternatives to the right of the current branch 
up to the scope of the cut. In a sequential system, cut only prunes alternatives 
whose exploitation has not been started yet. This does not hold for or-parallel 
systems, as cut can prune alternatives that are being exploited by other workers 
or that have already been completely exploited. Therefore, the cut semantics 
in a parallel environment introduces new problems. First, a pruning operation 
cannot always be completely performed if the branch executing the cut is not 
leftmost, because the operation itself may be pruned by the execution of other 
pruning operation in a branch to the left. Similarly, an answer for the query goal 
in a non-leftmost branch may not be valid. Last, when pruning we should stop 
the workers exploiting the pruned branches. 

Ali showed that speculative computations can be completely banned from 
a parallel system if proper rules are applied [1], However, such rules severely 
restrict parallelism. Hence, most parallel systems allow speculative computa- 
tions. Speculative computations can be controlled more or less tightly. Ideally, 
we would prune all computations as soon as they become useless. In practice, 
deciding if a computation is still speculative or already useless can be quite com- 
plex when nested cuts with intersecting scopes are considered. We next discuss 
how cut executes in OPTYap (later we will discuss how cut affects the table). 



2.1 Cut in OPTYap 

The OPTYap system builds on the or-parallel system YapOr [7] and on the 
tabling engine YapTab [8]. YapOr is based on the environment copying model 
for shared memory machines [9] . YapTab is a sequential tabling engine that ex- 
tends Yap’s execution model to support tabled evaluation for definite programs. 
YapTab’s design is largely based on the ground-breaking XSB logic programming 
system [10], which implements the SLG-WAM [11], OPTYap’s execution model 
considers tabling as the base component of the system. Each computational 
worker behaves as a full sequential tabling engine. The or-parallel component of 
the system is triggered to allow synchronised access to the shared part of the 
search space or to schedule work. 
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OPTYap currently implements a cut scheme based on the ideas presented by 
Ali and Karlsson [3], designed to prune useless work as early as possible. The 
guiding rule is: we cannot prune branches that would not be pruned if our own 
branch will be pruned by a branch to the left. . Thus, a worker executing cut must 
go up in the tree until it reaches either the scope of the cut, or, a node with 
workers executing in branches to the left.. A worker may not be able to complete 
a cut if there are workers in branches to the left, because such workers can 
themselves prune the current cut. Such incomplete cuts are called left pending. 
In OPTYap, a cut is left pending on the youngest node A f that has left branches. 
A pending cut can only be resumed when all workers to the left backtrack to 
Af. It will then be the responsibility of the last worker backtracking to Af to 
continue the execution of the pending cut. 

While going up, a worker may also find workers in branches to the right. 
If so, it sends them a signal informing that their branches have been pruned. 
Such workers must backtrack to the shared part of the tree and start searching 
for new work. Note that even if a cut is left pending in a node Af, there may 
be branches, older than Af, that correspond to useless work. OPTYap prunes 
these branches immediately. To illustrate how these branches can be detected 
we present in Fig. 1 a small example taken from [3]. For simplicity, the example 
ignores indexing and assumes that a node is always allocated for predicates 
defined by more than one clause. To better understand the example, we index 
the repeated calls to the same predicate by call order. For instance, the node 
representing the first call to predicate p is referred as pi, the second as p 2 and 
successively. We also write pi 1 ' to denote the ith alternative of node p n . Note 
also that we use the symbol ! to mark the alternatives corresponding to clauses 
with cuts. 

Figure 1(a) shows the initial configuration, where a worker VV is computing 
the branch corresponding to [p^ 11 , q^ 1 ' , p^ 1 ' , q^ 2 ' , pg 2 '] . Its current goal is 
“!(p 2 ), !(pi)”, where !(p 2 ) means a cut with the scope p 2 and ! (pi) means a cut 
with the scope pi. There are only two branches to the left, corresponding to 
alternatives p^ 1 ' and q 2 ( ' 1 '. If there are workers within alternative P 3 ' 1 ', then W 
cannot execute any pruning at all because p 3 ( 3 ) is marked as containing cuts. 
A potential execution of a pruning operation in p 3 *d) will invalidate any cut 
executed in p 3 ^ 2 ) by W. Therefore, W saves a cut marker in p 3 to indicate a 
pending cut operation (Fig. 1(b)). A cut marker is a two field data structure 
containing information about the scope of the cut and about the alternative of 
the node which executed the cut. 

Let’s now assume that there are no workers in alternative p 3 ' x ' , but there are 
in alternative q 2 ' 1 '. Alternative q 2 ' x ' is not marked as containing cuts, but the 
continuation of q 2 contains two pruning operations, !(p 2 ) and !(pi). The worker 
W first executes !(p 2 ) in order to prune q 2 ( - 3 ' and p 2 ^ 2 '. This is a safe pruning 
operation because any pruning from q^ 1 ' will also prune q 2 ^ 3 ' and p 2 ' 2 '. At the 
same time W stores a cut marker in q 2 to signal the pruning operation done. As 
we will see, for such cases, the cut marker is used to prevent unsafe future pruning 
operations from the same branch. Consider the continuation of the situation, W 
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(a) 



(b) 



(c) 




p ( [H | T] ) : 


- q (H) , p (T) , !. 


q (1) . 




P([]) • 


?- p( [1,2] ) . 


q (2) . 
q (3) . 


J 



Fig. 1 . Pruning in the or-parallel environment 



tries to execute !(pi) in order to prune qi^ 2 \ q/ 3 ^ and pi®. However, this is 
a dangerous operation. A worker in q 2 W may execute the previous pruning 
operation, ! (P 2 ) , pruning W”s branch but not q^ 2 ), q/ 3 -* or p/ 2 -*. Hence, there is 
no guarantee that the second pruning, ! (pi ) , is safe. The cut marker stored in q 2 
is a warning that this possibility exists. So, instead of doing pruning immediately, 
W updates the cut marker stored in q 2 to indicate the new pending cut operation 
(Fig. 1(c)). 

2.2 Tree Representation 

To represent the shared part of the search tree, OPTYap follows the Muse ap- 
proach [9] and uses or-frames. When sharing work, an or-frame is added per 
choice point being shared, in such a way that the complete set of or-frames form 
a tree that represents the shared part of the search tree. Or-frames are used to 
synchronise access to the unexploited alternatives in a shared choice point, and 
to store scheduling data. By default, an or-frame contains the following fields: 
the OrFr_lock field supports a busy-wait locking mutex mechanism that guaran- 
tees atomic updates to the or-frame data; the DrFr_alt held stores the pointer 
to the next unexploited alternative in the choice point; the OrFr_members held is 
a bitmap that stores the set of workers sharing the choice point; the QrFr_node 
held is a back pointer to the correspondent choice point; and the DrFr_next held 
is a pointer to the parent or-frame on the current branch. 
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Identifying workers on left branches or checking whether a branch is leftmost 
requires a mechanism to represent the relative positions of workers in the search 
tree. Our implementation uses a branch () matrix, where each entry branch[w, d) 
corresponds to the alternative taken by worker w in the shared node with depth 
d of its current branch. Figure 2 shows a small example that clarifies the corre- 
spondence between a particular search tree and its matrix representation. Note 
that we only need to represent the shared part of a search tree in the matrix. 
This is due to the fact that the position of each worker in the private part of the 
search tree is not relevant when computing relative positions. 




branch (w, d) 




Fig. 2. Search tree representation 



To correctly consult or update the branch matrix, we need to know the depth 
of each shared node. We thus introduced a new data field in the or-frame data 
structure, the QrFr_depth field, that holds the depth of the corresponding node. 
By using the QrFr_depth field and the OrFr_members bitmap of each or-frame 
to consult the branch matrix, we can easily identify the workers in a node that 
are in branches at the left or at the right the current branch of a given worker. 

Let us suppose that a worker W wants to check whether it is leftmost or 
at which node it ceases from being leftmost. W should start from the youngest 
shared node A f on its branch, read the OrFr_members bitmap from the or-frame 
associated with A f to determine the workers sharing the node, and investigate 
the branch matrix to determine the alternative number taken by each worker 
sharing A f. If W finds an alternative number less than its own, then W is not 
leftmost. Otherwise, W is leftmost in A/ and will repeat the same procedure at 
the next upper node on branch and so on until reaching the root node or a node 
where it is not leftmost. 



2.3 Pending Answers 

OPTYap also builds on a mechanism originally designed for a problem in or- 
parallel systems: an answer for the query goal may not be valid, if the branch 
where the answer was found may be pruned. At the end of the computation, 
only valid answers should be seen. 
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OPTYap addresses this problem by storing a new answer in the youngest 
node where the current branch is not leftmost. A new data held was therefore 
introduced in the or-frame data structure, the OrFr_qg_answers held. This held 
allows access to the set of pending answers stored in the corresponding node. 
Also, new data structures store the pending answers that are being found for 
the query goal in hand. Figure 3 details the data structures used to efficiently 
keep track of pending answers. Answers from the same branch are grouped into 
a common top data structure. The top data structures are organised by reverse 
branch order. This organisation simplifies the pruning of answers that became 
invalid in consequence of a cut operation to the left. 




Fig. 3. Dealing with pending answers 



When a node A f is fully exploited and its corresponding or-frame is being 
deallocated, the whole set of pending answers stored in J\f can be easily linked 
together and moved to the next node where the current branch is not leftmost. 
At the end, the set of answers stored in the root node are the set of valid answers 
for the given query goal. 

3 Cut Within the Or-Parallel Tabling Environment 

Extending the or-parallel system to include tabling introduces further complexity 
into cut’s semantics. Dealing with speculative tabled computations and guaran- 
teeing the correctness of tabling semantics, without compromising the perfor- 
mance of the or-parallel tabling system, requires very efficient implementation 
mechanisms. In this section, we present the OPTYap’s approach. Before we start, 
we provide a brief overview of the basic tabling definitions and distinguish the 
two types of cut operations in a tabling environment: inner cuts and outer cuts. 
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3.1 Basic Tabling Definitions 

Tabling is about storing intermediate answers for subgoals so that they can be 
reused when a repeated subgoal appears. Whenever a tabled subgoal S is first 
called, an entry for S is allocated in the table space. This entry will collect all the 
answers found for S. Repeated calls to variants of S are resolved by consuming 
the answers already stored in the table. Meanwhile, as new answers are gener- 
ated, they are inserted into the table and returned to all variant subgoals. Within 
this model, the nodes in the search space are classified as either generator nodes, 
corresponding to first calls to tabled subgoals, consumer nodes, corresponding to 
variant calls to tabled subgoals, and interior nodes, corresponding to non-tabled 
subgoals. 

Tabling evaluation has four main types of operations for definite programs. 
The tabled subgoal call operation checks if the subgoal is in the table and if not, 
inserts it and allocates a new generator node. Otherwise, allocates a consumer 
node and starts consuming the available answers. The new answer operation 
verifies whether a newly generated answer is already in the table, and if not, in- 
serts it. The answer resolution operation consumes the next unconsumed answer 
from the table, if any. The completion operation determines whether a tabled 
subgoal is completely evaluated, and if not, schedules a possible resolution to 
continue the execution. 

The table space can be accessed in different ways: to look up if a subgoal is in 
the table, and if not insert it; to verify whether a newly found answer is already in 
the table, and if not insert it; and to pick up answers to consumer nodes. Hence, 
a correct design of the algorithms to access and manipulate the table data is 
a critical issue to obtain an efficient implementation. Our implementation uses 
tries as the basis for tables, as proposed by Ramakrishnan et al. in [12]. 

Figure 4 shows the general table structure for a tabled predicate. Table 
lookup starts from the table entry data structure. Each table predicate has one 
such structure, which is allocated at compilation time. Calls to the predicate will 
always access the table starting from this point. 

The table entry points to a tree of trie nodes, the subgoal trie structure. 
More precisely, each different call to the tabled predicate in hand corresponds 
to a unique path through the subgoal trie structure. Such a path always starts 
from the table entry, follows a sequence of subgoal trie data units, the subgoal 
trie nodes, and terminates at a leaf data structure, the subgoal frame. 

Each subgoal frame stores information about the subgoal, namely an entry 
point to its answer trie structure. Each unique path through the answer trie 
data units, the answer trie nodes, corresponds to a different answer to the entry 
subgoal. To obtain the set of available answers for a tabled subgoal, the leaf 
answer nodes are chained in a linked list in insertion time order, so that we can 
recover answers in the same order they were inserted. The subgoal frame points 
to the first and last answer in this list. Thus, a consumer node only needs to 
point at the leaf node for its last consumed answer, and consumes more answers 
just by following the chain. To load an answer, the trie nodes are traversed in 
bottom-up order and the answer is reconstructed. 
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Fig. 4. Using tries to organise the table space 




Fig. 5. The two types of cut operations in a tabling environment 



3.2 Inner and Outer Cut Operations 

We consider two types of pruning in a tabling environment: cuts that do not 
prune alternatives in tabled predicates - inner cut operations, and cuts that 
prune alternatives in tabled predicates - outer cut operations. In Fig. 5 we illus- 
trate four different situations corresponding to inner and outer cut operations. 
Below each illustration we present a block of Prolog code that may lead to such 
situations. For simplicity, we assume that t is the unique tabled predicate defined 
and that the “. . . ” parts do not include t. Note that the rightmost situation 
only occurs if a parallel tabling environment is considered, as otherwise t will 
only be called if the cut operation in the first alternative of s is not executed. 
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Cut semantics for outer cut operations is still an open problem. A major 
problem is that of pruning generator nodes. Pruning generator nodes cancels 
its further completion and puts the table space in an inconsistent state. For 
sequential tabling, a simple approach is to delete the whole table data structures 
related with the pruned subgoal and recompute it from the beginning when it 
reappears. This can be safely done because when a generator is pruned all variant 
consumers are also pruned. On the other hand, for parallel tabling, it is possible 
that generators will execute earlier, and in a different branch than in sequential 
execution. In fact, different workers may execute the generator and the consumer 
goals. Workers may have consumer nodes while not having the corresponding 
generator nodes in their branches. Conversely, the owner of a generator node 
can have consumer nodes being executed by several different workers. 

The intricate dependencies in a parallel tabled evaluation makes pruning 
a very complex problem. A possible solution to this problem can be moving 
the generator’s role to a non-pruned dependent consumer node, if any, in order 
to allow further exploitation of the generator’s unexploited branches. Such a 
solution will require that the other non-pruned consumer nodes recompute and 
update their dependencies relatively to the new generator node. Otherwise, if all 
dependent consumer nodes are also pruned, we can suspend the execution stacks 
and the table data structures of the pruned subgoal and try to resume them when 
the next variant call takes place. Further research is still necessary in order to 
study the combination of pruning and parallel tabling. Currently, OPTYap still 
does not support outer cut operations and for such cases execution is aborted. 
Outer cut operations are detected when a worker moves up in the tree either 
because it is executing a cut operation or it has received a signal informing that 
its branch have been pruned away by another worker. 



3.3 Detecting Speculative Tabled Answers 

As mentioned before, a main goal in the implementation of speculative tabling 
is to allow storing valid answers immediately. We would like to maintain the 
same performance as for the programs without cut operators. In this subsection, 
we introduce and describe the data structures and implementation extensions 
required to efficiently detect if a tabled answer is speculative or not. 

We introduced a global bitmap register named GLOBAL_pruning_workers to 
keep track of the workers that are executing branches that contain cut operators 
and that, in consequence, may prune the current goal. Additionally, each worker 
maintains a local register, L0CAL_saf e_scope, that references the youngest node 
that cannot be pruned by any pruning operation executed by itself. 

The correct manipulation of these new registers is achieved by introducing a 
new instruction clause_with_cuts to mark the blocks of code that include cut 
instructions. During compilation, the code generated for the clauses containing 
cut operators is extended to include the clause_with_cuts instruction so that 
it is the first instruction to be executed for such clauses. When a worker loads a 
clause_with._cuts instruction, it executes the clause_with_cuts () procedure. 
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Figure 6 details the pseudo-code that implements the clause_with_cuts () 
procedure. It sets the worker’s bit in the GLOBAL_pruning_workers register and, 
if the L0CAL_saf e_scope register is younger than the current node, it updates 
the L0CAL_saf e_scope register to refer to the current node. The current node is 
the resulting top node if a pruning operation takes place in the clause in hand. 



clause_with_cuts() { 

if (LDCAL_saf e_scope == NULL) { // first time here 

insert_into_bitmap (GLOBAL_pruning_workers , W0RKER_id) 

L0CAL_saf e_scope = B // B is a pointer to the current choice point 
} else if (L0CAL_saf e_scope is younger than B) { 

L0CAL_saf e_scope = B 

} 

} 

Fig. 6. Pseudo-code for clause_with_cuts() 



When a worker finds a new answer for a tabled subgoal, it first inserts it 
into the table space and then checks if the answer is safe from being pruned. 
When this is the case, the answer is inserted at the end of the list of available 
answers, as usual. Otherwise, if it is found that the answer can be pruned by 
another worker, its availability is delayed. Figure 7 presents the pseudo-code 
that implements the checking procedure. 



speculative_tabled_answer (generator node G) { // G is the generator... 

prune_wks = GLOBAL_pruning_workers // ...for the answer being checked 
delete_from_bitmap(prune_wks , W0RKER_id) 

if (prune_wks is not empty) { // there are workers that may... 

or_fr = youngest_or_frame() // ...execute pruning operations 

depth = OrFr_depth(or_fr) 
scope_depth = QrFr_depth(G->or_f rame) 

while (depth > scope_depth) { // check the branch till... 

alt_number = branch (W0RKER_id, depth) // ...the generator 

for (w = 0; w < number_workers ; w++) { 

if (w is in prune_wks && w is in DrFr_members (or_fr) && 
branch(w, depth) < alt_number && 

0rFr_node (or_fr) is younger than L0CAL_saf e_scope(w) ) 
return or_fr // the answer can be pruned by worker w 

} 

or_fr = 0rFr_next (or_f r) 
depth = OrFr_depth(or_fr) 

} 



return NULL 



// the answer is safe from being pruned 



Fig. 7. Pseudo-code for speculative_tabled_answer () 



The procedure starts by determining if there are workers that may execute 
pruning operations. If so, it checks the safeness of the branch where the tabled 
answer was found. The branch only needs to be checked until the corresponding 
generator node, as otherwise it would be an outer cut operation. A branch is 
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found to be safe if it is leftmost, or if the workers in the branches to the left 
cannot prune it. If it is found that the answer being checked can be speculative, 
the procedure returns the or-frame that corresponds to the youngest node where 
the answer can be pruned by a worker in a left branch. That or-frame is where 
the answer should be left pending. Otherwise, if it is found that the answer is 
safe, the procedure returns NULL. 

3.4 Pending Tabled Answers 

Tabled answers are inserted in advance into the table space. However, if a tabled 
answer is found to be speculative, its insertion in the list of available answers 
is delayed and the answer is left pending. This prevents unsafe answers to be 
consumed elsewhere in the tree. Only when it is found that a pending answer 
is safe from being pruned, it is released as a valid answer and inserted at the 
end of the list of available answers for the subgoal. Dealing with pending tabled 
answers requires efficient support to allow that the operations of pruning or 
releasing pending answers are efficiently performed. 

Remember that speculative tabled answers are left pending in nodes. To allow 
access to the set of pending answers for a node, a new data field was introduced 
in the or-frame data structure, the OrFr_tg_answers field. New data structures 
were also introduced to efficiently keep track of the pending answers being found 
for the several tabled subgoals. Figure 8 details that data structure organisation. 





Fig. 8. Dealing with pending tabled answers 



The figure shows a situation where three tabled answers, answer-x, answer-y 
and answer-z, were found to be speculative and therefore have all been left 
pending in a common node A f. N is the youngest node where a worker in a left 
branch, W in the figure, holds a L0CAL_saf e_scope register pointing to a node 
older than J\f. 
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Pending answers found for the same subgoal and from the same branch are 
addressed by a common top data structure. As the answers in the figure were 
found in different subgoal/branch pairs, three top data structures were required, 
answer-x, answer-y and answer-z were found, respectively, in branches 2, 3 
and 3 for the subgoals corresponding to generator nodes Q\, Gi and Q^. These 
data structures are organised in older to younger generator order and by reverse 
branch order when they are for the same generator. Hence, each data structure 
contains two types of pointers to follow the chain of structures, one points to 
the structure that corresponds to the next younger generator node, while the 
other points to the structure that corresponds to the next branch within the 
same generator. 

Blocks of answers address the set of pending answers for a subgoal/branch 
pair. Each block points to a fixed number of answers. By linking the blocks we 
can have a large number of answers for the same subgoal/branch pair. Note 
that the block data structure does not hold the representation of a pending 
answer, only a pointer to the leaf node of the answer trie structure representing 
the pending answer. As we will see, with this simple scheme, we can easily 
differentiate between occurrences of the same speculative answer in different 
branches. Figure 9 shows the procedure that OPTYap executes when a tabled 
answer is found. 



tabled_answer (answer A, generator node G) { // G is the generator... 

sf = subgoal_f rame (G) // ...for the answer A 

leaf_node = insert_into_table_space(A, sf) 
if (leaf_node is a valid answer) { 

failQ // already in the list of available answers for sf 

} else { 

or_fr = speculative_tabled_answer (G) 

if (or_fr == NULL) // the answer is safe from being pruned 

valid_answer(leaf_node, sf) 
else 

left_pending(leaf _node , or_fr) 

} 



} 



Fig. 9. Pseudo-code for tabled_answer() 



The procedure starts by inserting the answer in the table space. Then, it 
verifies if the answer is already tabled as a valid answer and, if so, the execution 
fails as usual. Otherwise, it checks if the answer is safe from being pruned. Being 
this the case, the answer is tabled as a valid answer. Otherwise, it is left pending. 

Suppose now that we have an answer A left pending in a node A f and that 
a new occurrence of A is found elsewhere. Two situations may happen: the new 
occurrence of A is also speculative or it is safe from being pruned. In the first 
case, A is left pending in a node of the current branch. This is necessary because 
there is no way to know beforehand in which branch A will be proved first to be 
not speculative, if in any. In the second case, A is released as a valid answer and 
inserted in the list of available answers for the subgoal in hand. Note that in this 
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case, A still remains left pending in node AT . In any case, A is only represented 
once in the table space. 

With this scheme, OPTYap implements the following algorithm in order to 
release answers as soon as possible: the last worker W leaving a node A f with 
pending tabled answers, determines the next node Ad on its branch that can be 
pruned by a worker to the left. The pending answers from A f that correspond 
to generator nodes equal or younger than Ad are made available (if an answer 
is already valid, nothing is done), while the remaining are moved from A f to 
Ad. Note that W only needs to check for the existence of Ad up to the oldest 
generator node for the pending answers stored in A F. To simplify finding the 
oldest generator node we organised the top data structures in older to younger 
generator order (please see Fig. 8). 

On the other hand, when a node N is pruned, its pending tabled answers 
can be in one of three situations: only left pending in A /"; also left pending in 
other nodes; or already valid answers. Note that for all situations no interaction 
with the table space is needed and M can simply be pruned away. Even for the 
first situation, we may keep the answers in the table and wait until completion, 
as in the meantime such answers can still be generated again in other branches. 
So, only when a subgoal is completed evaluated, it is required that the answer 
trie nodes representing speculative answers are removed from the table. As this 
requires traversing the whole answer trie structure, for simplicity and efficiency, 
this is only done in the first call to the tabled subgoal after it has been completed. 



4 Conclusions 

In this paper we discussed the management of speculative computations in or- 
parallel tabled logic programs. Our approach deals with inner pruning at a first 
step and we address speculative tabled computations by delaying the point at 
which their answers are made available in the table. With this support, OPTYap 
is now able to execute a wider range of applications without introducing signif- 
icant overheads (less than 1%) for applications without cuts. 

Support for outer cuts is a delicate issue. To our knowledge, the first proposal 
on outer cuts for sequential tabling was presented by Guo and Gupta in [13]. 
They argue that cuts in tabling systems are most naturally interpreted as a 
commit, and they define the cut operator in terms of the operational semantics of 
their tabling strategy [14], which is based on recomputation of so-called looping 
alternatives. In more recent work, Castro and Warren propose the demand-based 
once pruning operator [15], whose semantics are independent of the operational 
semantics for tabling, but which does not fully support cut. We believe that a 
complete design for outer cut operations in sequential tabling is still an open 
and, arguably, a controversial problem. 

To fully support pruning in parallel tabling, further work is also required. We 
need to do it correctly, that is, in such a way that the system will not break but 
instead produce sensible answers according to the proposed sequential semantics, 
and well , that is, allow useful pruning with good performance. 
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Abstract. There are two well-known approaches to programming with 
names, binding, and equivalence up to consistent renaming: representing 
names and bindings as concrete identifiers in a first-order language (such 
as Prolog), or encoding names and bindings as variables and abstractions 
in a higher-order language (such as AProlog). However, both approaches 
have drawbacks: the former often involves stateful name-generation and 
requires manual definitions for a-equivalence and capture-avoiding sub- 
stitution, and the latter is semantically very complicated, so reasoning 
about programs written using either approach can be very difficult. Gab- 
bay and Pitts have developed a new approach to encoding abstract syn- 
tax with binding based on primitive operations of name-swapping and 
freshness. This paper presents aProlog, a logic programming language 
that uses this approach, along with several illustrative example programs 
and an operational semantics. 



1 Introduction 

Names, binding, a-equivalence, and capture-avoiding substitution are endemic 
phenomena in logics and programming languages. The related concepts of name 
freshness, fresh name generation and equivalence up to consistent renaming also 
appear in many other domains, including state identifiers in finite automata, 
nonces in security protocols, and channel names in process calculi. Dealing with 
names is therefore an important practical problem in meta-programming, and 
there are a variety of approaches to doing so, involving different tradeoffs [3,4, 
9, 12, 15, 18, 21, 24] . The following are important desiderata for such techniques: 

• Convenience: Basic operations including substitution, a-equivalence, and 
fresh name generation should be built-in. 

• Simplicity: The semantics of the meta-language should be as simple as pos- 
sible in order to facilitate reasoning about programs. 

• Abstraction: Low-level implementation details concerning names should be 
taken care of by the meta- language and hidden from the programmer. 

• Faithfulness/ Adequacy: Object terms should be in bijective correspondence 
with the values of some meta-language type. 

In first-order abstract syntax (FOAS), object languages are encoded using first- 
order terms (e.g. Prolog terms or ML datatypes). Names are encoded using a 
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concrete datatype var such as strings, and binders are encoded using first-order 
function symbols like lam : var x exp — > exp. FOAS has several disadvantages: 
the encoding does not respect a-equivalence, damaging adequacy; fresh names 
are often generated using side-effects, complicating the semantics; and opera- 
tions like a-equivalence and substitution must be implemented manually (and 
painstakingly). Nameless encodings like cle Bruijn indices [3] ameliorate some, 
but not all, of these problems. 

In higher-order abstract syntax (HO AS) [18], object languages are encoded 
using higher-order terms (e.g. A-terms in AProlog [16]). In HOAS, names are en- 
coded as meta-language variables and binders are encoded with meta-language 
A-abstraction using lriglrer-order function symbols like lam : ( exp — » exp) — » 
exp. Capture-avoiding substitution and a-equivalence need only be implemented 
once, in the meta-language, and can be inherited by all object languages. How- 
ever, because of the presence of types like exp that are defined via negative 
recursion, the semantics of HOAS is complex [10] and induction principles are 
difficult to develop. Moreover, HOAS cannot deal with open object terms, that 
is terms containing free variables. In weak HOAS [4], induction principles are 
recovered by encoding names using a concrete type var , and encoding binders 
encoded as A-abstractions using constructors like lam : ( var — > exp ) — > exp. In 
this approach, a-equivalence is still built-in, but substitution must be defined. 
Also, weak HOAS encodings may not be adequate because of the presence of 
exotic terms, or closed terms of type exp which do not correspond to any object 
term; additional well-formedness predicates are needed to recover adequacy. 

Recently, Gabbay and Pitts developed a novel approach to encoding names 
and binding [8], based on taking name-swapping and freshness as fundamen- 
tal operations on names. This approach has been codified by Pitts as a theory 
of first-order logic called nominal logic [19], in which names are a first-order 
abstract data type admitting only swapping, binding, and equality and fresh- 
ness testing operations. Object language variables and binding can be encoded 
using names x, y and name- abstractions x.i, which are considered equal up to 
a-equivalence. For example, object variables x and binders Xx.t can be encoded 
as nominal terms uar(x) and abstractions Zam(xi) where var : id — > exp and 
lam : ( id)exp — > exp. 

We refer to this approach to programming with names and binding as nom- 
inal abstract syntax (NAS). NAS provides a-equivalence and fresh name gener- 
ation for free, while remaining semantically simple, requiring neither recursive 
types nor stateful name-generation. Furthermore, names are sufficiently abstract 
that the low-level details of name generation can be hidden from the program- 
mer, and exotic terms are not possible in NAS encodings. However, names are 
still sufficiently concrete that there is no problem working with open terms. 
Therefore NAS makes possible a distinctive new style of meta-programming. 

This paper presents aProlog, a logic programming language based on the 
Horn clause fragment of nominal logic which supports nominal abstract syn- 
tax. In the rest of this paper, we describe aProlog, and discuss its unification 
and constraint solving algorithm (due to Urban, Pitts, and Gabbay [25]) and 
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operational semantics. We also discuss an important open issue: our current im- 
plementation is incomplete, because complete proof search in aProlog requires 
solving equivariant unification problems, which are NP-complete in general [2]. 
However, this problem can often be avoided, and we give several examples of 
interesting languages and relations that can be encoded using NAS in aProlog. 
We conclude with a discussion of related languages and future work. 

2 Syntax 

The term language of aProlog consists of (nominal) terms , constructed according 
to the grammar 

t ::= X | n | n.f | (n m) • t \ f(t ) 

where X is a (logic) variable, f is a function symbol (we write t to denote 
a (possibly empty) sequence (ti, . . . , f n )), and n,m are names. By convention, 
function symbols are lower case, logic variables are capitalized, and names are 
printed using the sans-serif typeface. We shall refer to a term of the form /(f) as 
an atomic term. Terms of the form n.f are called abstractions , and terms of the 
form (n m) -f are called swapping, which intuitively denote the result of swapping 
names n,m within t. Swapping takes higher precedence than abstraction, i.e., 
(a b) • c.t = ((a b) • c).f. Variables cannot be used to form abstractions and 
transpositions, i.e., X.t and (X Y) • t are not legal terms. 

aProlog has a ML-like polymorphic type system. Types are classified into 
two kinds : type, the kind of all types, and name.type, the kind of types inhab- 
ited only by names. Types classify terms, and include atomic type constructor 
applications c(cti, . . . , cr n ) as well as type variables a and abstraction types .ua. 
In an abstraction, the kind of u must be name.type. Type constructor and 
uninterpreted function symbol declarations are of the form as c : (/c) — > k! and 
/ : (a) — » a 1 , where k and a indicate kinds and types respectively. The re- 
sult type of an uninterpreted function symbol may not be a built-in type or a 
name.type. Relation symbols are declared as pred p(a) and interpreted func- 
tion symbols as func /(<r) = a'. Type abbreviations can be made with the 
declaration type c(a) = a. The latter three declaration forms are loosely based 
on Mercury syntax [23] . We assume built-in and self-explanatory type and func- 
tion symbols for pairs (( x,y ) : a x a') and lists ([],£ :: y, [x\y\ : [a]). 

Atomic formulas A are terms of the form p(ti, . . . ,t n ), where p is a relation 
symbol. Constraints C include freshness formulas t ff u, and equality formulas 
t = u. In t f( u, the term t must be of some name type v : name.type, 
whereas u may be of any type; in t = u, both t and u must be of the same 
type. Goals G consist of sequences of constraints and atomic formulas. Program 
clauses include Horn clauses of the form A:— G and function-definition clauses 
of the form /(f) = t' G, which introduce a (conditional) rewrite rule for an 
atomic term with an interpreted head symbol /. 

By convention, constant symbols are function symbols applied the empty 
argument list; we write c instead of c(), and c : r instead of c : () — > r. This also 
applies to propositional and type constants. We abbreviate clauses A () and 
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/(f) = f' (), where () denotes the empty sequence, as A and f(t) = t' . We 
write V(-) and N(-) for the variables or names of a term or formula. Observe 
that N(-) includes all occurrences of names in t , even abstracted ones, hence 
TV(x.x) = {x}. We say a nominal term e is ground when V(e) = 0; names may 
appear in ground terms, so f(X,Y ) is not ground but /(x, y) is. We write VN(t ) 
for V(t)UN(t). 

3 Semantics 

In this section we present an operational semantics for aProlog programs. We 
describe the equality and freshness theory of nominal logic, nominal unification, 
and ccProlog’s execution algorithm, emphasizing the main novelties relative to 
standard unification and logic programming execution. 

3.1 Equality, Freshness, and Unification 

Figure 1 shows the axioms of equality and freshness for ground nominal terms 
(based on [19,25]). The swapping axioms (Si) (S5) describe the behavior of 
swapping. From now on, we assume that all terms are normalized with respect 
to these axioms (read right-to-left as rewrite rules), so that swaps are not present 
in ground terms and are present only surrounding variables in non-ground terms. 

The next two axioms (Ai), ( A 2 ) define equality for abstractions. The first 
axiom is a simple congruence property. The second guarantees that abstractions 
are equal “up to renaming” . Two abstractions of different names x.t, y.u are equal 
just in case their bodies are equal up to swapping the names (i.e., t = (x y )-u) and 
x does not appear free in u (i..e., x ft u) Symmetrically, it suffices to check y ft t\ 
the two conditions are equivalent if t = (x y) • u. For example, x.g{x) = y.g{y) 
and x./(x, y) = z./(z,y), but x./(x, y) ft y./(y,x) because x ft /( y,x) fails. 

The freshness axioms (Fi) {F$) describe the freshness relation. Intuitively, 
x ^ f means “name x does not appear unbound in f”. For example, it is never 
the case that x ft x, whereas any two distinct names are fresh (x ft y => x ft y). 
Moreover, freshness passes through function symbols (in particular, any name 
is fresh for any constant). The abstraction freshness rules are more interesting: 
x ft x.t is unconditionally valid because any name is fresh for a term in which it 
is immediately abstracted, whereas if x and y are different names, then x ft y.t 
just in case x ft t. 



(Si) (n m) • n = m (S2) (n m) • m = 
(S 4 ) (n m) • /(f) = /(( n m) • t) 

(Tlx) t = u => n.f = n.u 

(El) n ft m => n ft m (E 2 ) i(n ft n) 

{Fa) n ft n.f 



(S3) x ft n, x ft m => (n m) • x = x 
(Ss) (n m) • (x.t) = (n m) • x.(n m) • t 
{ A2 ) t = (n m)-«An|n 4 n.f = m .u 
(F 3 ) A"=i n ft n ft f{t) 

{F s ) n#mAn#f=>n#m.t 



Fig. 1. Ground equational and freshness theory 
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Nominal unification is unification of nominal terms up to a-equivalence (as 
formalized by the axioms of Figure 1). For ground terms, nominal unification 
coincides with a-equivalence: for example, the term n.n unifies with m.m, but n.m 
and n.n do not unify. However, non-ground terms such as r\.X and m.X unify only 
subject to the freshness constraints n ff X and m ff X. A freshness constraint 
of the form n ff X states that X may not be instantiated with a term containing 
a free occurrence of n. The problem n.X ~? m.Y is unified by substitution 
X = (n m) • Y subject to the constraint n ff X\ that is, X must be identical to 
Y" with n and m swapped, and n must be fresh for X . The nominal unification 
algorithm therefore also must solve freshness (or disunification) subproblems of 
the form t ffl u , where t is of name type. 

Urban, Pitts, and Gabbay [25] developed an algorithm for solving nomi- 
nal unification and freshness constraint problems of the form encountered in 
aProlog. A modified form of this algorithm is used in our current implementa- 
tion. For space reasons, we omit a fuller discussion of the details of the algorithm, 
and note that any other constraint solving procedure for nominal equality and 
freshness constraints could be used instead. 



3.2 Operational Semantics 

We now present the operational semantics of aProlog programs. This semantics 
is based loosely on that of constraint logic programming [11], over the domain 
of nominal terms as axiomatized in Figure 1. A program is a set V of program 
clauses, closed under permutative renaming of names and variables. A program 
state consists a goal G and a set V of equality and freshness constraints; we shall 
write (G | V) for such a state. An answer to this query is a set of constraints 
V'. We define the operational semantics of an aProlog query using transitions 
of the form (G | V) — > (G 1 | V') and write G —4* V if (G | 0) — ►* (0 | V). 
The rules for the transitions are as follows: 

{C,G | V) — > (G | { C } U V) if {G} U V satisfiable 
<p(f), G | V) — > (t = u,G',G\ V) if (p(u) G') G V 

and VN(p(u) :- G') n VN(p(t ), G, V) = 0 
(G | V) — ¥ (G | V') if V' b V 

In the last rule, V' b V is constraint entailment in the theory of Figure 1; this 
rule permits constraint simplification via unification. In aProlog, as usual in 
logic programming, variables in program clauses or rewriting rules are renamed 
to new variables during backclraining; in addition, names are freshened to new 
names. Rewriting rules f(t ) = u G defining function symbols are translated 
to a clausal form pf(t , u ) :— G' via flattening , as in many Prolog systems. 

It is straightforward to show that our operational semantics is sound with 
respect to an appropriate variant of nominal logic. The proof of this fact relies 
on the soundness of nominal unification for nominal equational satisfiability 
([25, Thm. 2]). However, completeness fails, because we have not taken into 
account equivariance , an important property of nominal logic that guarantees 
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that validity is preserved by name-swapping [19]. Formally, the equivariance 
axiom asserts that p(t ) => p((n m) • t ) is valid in nominal logic for any atomic 
formula p(t ) and names n, m. For example, for any binary relation pred p(u, v ) 
for v : name_type, we have p(x, y) 4=> p( y, z) valid in both directions because 
the swapping (x y)(y z) translates between them. But many-to-one renamings 
may not preserve validity: for example, xffy^zffzis not valid. 

Because of equivariance, backchaining based on nominal unification is incom- 
plete. For example, given program clause p(n) where n is a name, the goal p(n) 
cannot be solved. Even though p(n) =>■ p(n) is obviously valid, proof search fails 
because the program clause p(n) must be freshened to p(n'), and p(n') and p(n) 
do not unify. However, by equivariance these formulas are equivalent in nominal 
logic, since p(n') => p((n n') • n') => p(n) p((n n') • n) =4> p(n'). 

This can be fixed by adding a transition rule 

<p(i),G|V>— ><p((a b) • i), G | V) . 

However, this rule introduces nondeterminism. To perform goal-directed proof 
search, it seems preferable to replace the naive backchaining rule above with one 
based on unifying up to nominal equality modulo a permutation: 

(p(f), G | V) — ► <tt • t = u, G\ G | V> 

if (p(u) :- G') G V and VN(p(u) G') n VN(p(t), G, V) = 0 

where n is a sequence of transpositions (ai bi) • • • (a n b n ). We use the term equiv- 
ariant unification for the problem of unifying up to nominal equality modulo a 
permutation. However, even deciding whether an equi variant unification problem 
has a solution is NP-complete [2]. This does not necessarily mean that equiv- 
ariant unification is impractical. Developing a practical approach to equivariant 
unification or constraint solving is the subject of current research, however, and 
the current version of aProlog opts for efficiency over completeness. We have 
experimented with brute-force search and more advanced techniques for equiv- 
ariant unification but have yet to find a satisfactory solution. 

Nevertheless, our incomplete implementation of aProlog is still useful. Equi- 
variant unification does not seem necessary for many interesting aProlog pro- 
grams, including all purely first-order programs and all the example programs in 
this paper (capture-avoiding substitution, typing, etc.). In fact, the semantics we 
have presented is complete for such programs. In a separate paper, we identify 
(and prove correct) a condition on program clauses that ensures that nominal 
unification-based backchaining is complete [26] . 

4 Example: The A-Calculus 

The prototypical example of a language with variable binding is the A-calculus. 
In aProlog, the syntax of A-terms may be described with the following type and 
constructor declarations: 

id : name_type. exp : type. 

var : id — > exp. app : (exp, exp) exp. lam : { id)exp — > exp. 
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We make the simplifying assumption that the variables of object A-terms are 
constants of type id. Then we can translate A-terms as follows: 

r x n = var(x) r ei e 2 n = app( r e i n , r e 2 n ) r Xx.e~' = lam(x. r e~') 

It is not difficult to verify that e is a A-term if and only if r e n is a closed nominal 
term, i.e. FV( r e n ) = 0 , and we have that e = a e 1 if and only if r e~ l ~ r e n . 

Example 1 (Typechecking and inference). First, we consider the problem of type- 
checking A-terms. The syntax of types can be encoded as follows: 

tid : name_type. ty : type. tvar:tid^ty. arr : (ty , ty) — > ty . 

We define contexts ctx as lists of pairs of identifiers and types, and the 3 -ary 
relation typ relating a context, term, and type: 

type ctx = [id x ty]- 
pred typ(ctx,tm,ty). 

typ(C, var(X),T ) mem((X, T),C). 

typ(C , app(E 1 ,E 2 ), T ') typ(C, E 1 ,arr(T, T’)), typ(C, E 2 ,T). 

typ(C , lam(x.E), arr(T , T')) x # C, typ([(x, T)\C], E, T). 

The predicate mem(a , [a]) is the usual predicate for testing list membership. 
The side-condition x £ Dom(T ) is translated to the freshness constraint x#C. 

Consider the query ? typ([], lam(x.lam(y ,var(x))) ,T) . We can reduce this 
goal by backclraining against the suitably freshened rule 

typ(C 1 ,lam(x 1 .E 1 ),arr(T 1 ,U 1 )) x x # C 1 ,typ([(x 1 ,T 1 )\C 1 ],E 1 ,U 1 ) 

which unifies with the goal with [C\ = [] ,E±= lam(y.var(xi)), T = arr(Ti, U±)\. 
This yields subgoal Xi # 0 > typ([(xi, Tf)\Ci\, E±, U±). The first conjunct is triv- 
ially valid since C\ is a constant. The second is solved by backclraining against 
the third typ- rule again, producing unifier [C 2 = [( x i, 7 i)], E2 = var(xi),U± = 
arr(T 2 ,y 2 )] and subgoal x 2 # [(x 1 ,T 1 )],typ{[(x 2 ,T 2 ),(x 1 ,T 1 )],var(x 1 ),U2). The 
freshness subgoal reduces to the constraint x 2 f) T\ , and the typ subgoal can be 
solved by backclraining against 

typ(C 3 ,var(X 3 ),T 3 ) mem((X 3 ,T 3 ),C 3 ) 

using unifier [C 3 = [(x 2 , T 2 ), (x 1; Tj)], X3 = xi,T3 = I/ 2 ]. Finally, the remain- 
ing subgoal mem((xi,I/ 2 ), [(x 2 ,T 2 ), (xi,Ti)]) clearly has most general solution 
[U2 = Ti]. Solving for T, we have T = arr(T\,Ui) = arr(Ti, arr(T 2 , t/ 2 )) = 
arr(Ti,arr(T 2 ,Ti)). This solution corresponds to the principal type of Xx.Xy.x. 

Example 2 (Capture- avoiding substitution). Although capture-avoiding substi- 
tution is not a built-in operator in aProlog, it is easy to define via the clauses: 

func subst(exp, exp, id) = exp. 

subst(var(X),E,X ) = E. 

subst(var(Y ), E, X) = var(Y ) X #Y. 

subst(app(E\, E2), E, X) = app(subst(E\, E , X) , subst(E2, E , X)) . 

subst(lam(y.E l ), E, X) = lam(y.subst(E’ ,E, X)) y # (X, E). 
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Note the two freshness side-conditions: the constraint X f) Y prevents the first 
and second clauses from overlapping; the constraint y f) ( X , E) ensures capture- 
avoidance, by restricting the application of the fourth clause to when y is fresh for 
X and E. Despite these side-conditions, this definition is total and deterministic. 
Determinism is immediate: no two clauses overlap. Totality follows because, by 
nominal logic’s freshness principle, the bound name y in lam(y.E') can always be 
renamed to a fresh z chosen so that z ff ( X , E). It is straightforward to prove that 
subst( r t n , r t' n , x) coincides with the traditional capture-avoiding substitution on 
A-terms t[t'/x]. 

Consider the goal X = subst(lam(x.var(y)),var(x),y). The substitution on 
the right-hand side is in danger of capturing the free variable var(x). How is 
capture avoided in aProlog? First, note that function definitions are translated 
to a flattened clausal form in aProlog, so we must solve the equivalent goal 
subst 1 (lam(x.var(y)),var(x),y , X) subject to an appropriately translated defi- 
nition of subst'. The freshened, flattened clause 



subst l (lam(yi.E l 1 ), Ei, Xi,lam(yi, E") yi f) Ei, subst 1 (E[, E±, X±, E") 

unifies with substitution [E[ = var( y),X\ = y, E\ = var(x),X = lamiyi.E")}. 
The freshness constraint yi ff var(x) guarantees that var(x) cannot be captured. 
It is easily verified, so the goal reduces to subst' (var(y), var(x), y, E"). Using 
the freshened rule subst' (var(X2), E2, X2, E2) with unifying substitution [X2 = 
y,i?2 = var(x),E " = war(x)], we obtain the solution X = lam(yi.var(x)). 

Example 3 (Parsing). Logic programming languages often provide advanced 
support for parsing using definite clause grammars (DCGs). DCG parsing can be 
implemented in aProlog by translating DCG rules h — > t to ordinary aProlog 
programs. We assume familiarity with DCG syntax in this example. We assume 
that v(string),ws-opt,ws,token(string ) are predefined nonterminals recogniz- 
ing variable names, (optional) whitespace, and string tokens. Here is a small 
example of parsing A-terms from strings. (Here, meml(S, X, M) holds when 
S, X is the first binding of 5 in M.) 

type map = [(string, id)] 

ltm(M, var(X )) — » v(S), {meml(S, X, M)}. 

ltm(M , app(E±, E2)) — » token (“( ”), ws-opt, ltm(M, E±), ws, 

ltm(M, E2), ws-opt, token (“ )”) 

ltm(M ,lam(x.E)) — » {x f) M},token(“\”), uis-opt, v(X), uis-opt, 

token( ) , ws-opt, ltm([(S, x) | M],E). 

This program parses “\x.\y.x” to lam(x.lam( y.x)). 

4.1 Extending to the A/U-Calculus 

The A/i-calculus, invented by Parigot [ 17 ], extends the A-calculus with contin- 
uations a; terms may be “named” by continuations ([a]e) and continuations 
may be introduced with p- binding (pa.e). Intuitively, A/i-terms are proof terms 
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Terms, Types, and Contexts 
e ::= x | ( e e ') | Xx.e | [a]e | ya.e 
t ::= b | r — » t' | _L 
P ::= • | T, a: : r | P, a : r 



Replacement Operation 

a;{e/a} = x 

(ei e 2 ){f/a} = (ei{e/a} e 2 {e/a}) 

(A y.e'){e/a} = \y.e'{e/a} 

([a]e'){e/a} = M(e'{e/a} C ) 
me'){e/a} = [/3](e'{e/a}) (/3 * a) 
{Hp.e'){e/a} = /i/3.e'{e/a} (/3 g FN{e,a)) 



Some Typing-Rules 

a:T g P fhe:r fl~ ei : 1 f h e 2 : r P, a:r h e : 1 (a g Dom(r )) 
P h [a]e : _L P h (ei e 2 ) : _L P h fia.e : t 



Fig. 2. A slight variant of Parigot’s AyU-calculus 



for classical natural deduction, and /^-abstractions represent proofs by double 
negation. In addition to capture-avoiding substitution of terms for variables, the 
A/i-calculus introduces a capture-avoiding replacement operator e'{e/a} which 
replaces each occurrence of the pattern [a]e o in e 1 with [a](eo e). We give a 
variant of the A/i-calculus in Figure 2. In contexts P, the bar over the type of 
a indicates that it is not a value of type r, but a continuation accepting the 
type r. 

We may extend the A-calculus encoding with a new name type con for con- 
tinuations and term constructors for A/i-terms: 

con : name.type pass : (con, exp ) — > exp mu : { con)exp — exp 

and encoding r [a]f n = pass(a, r t n ) and r pa.t -1 = mu(a. r t~ l ). Again, it is easy 
to show that ground exp- terms are in bijective correspondence with A/i-terms. 

The standard approach to typechecking A/i-terms is to use two contexts, P 
and A , for variable- and continuation-bindings respectively. We instead consider 
a single context with variable-bindings x : r and continuation-bindings a : r. 
Therefore we modify the encoding of contexts slightly as follows: 

bind : type, vb : ( id,ty ) — > bind cb : ( con,ty ) — > bind type ctx = [bind]. 

Then the typechecking rules from the previous section may be adapted by re- 
placing bindings (x, T) with vb(x,T ), and adding three new rules: 

typ(C,pass( X, E ), bot) mem(cb(X, T ), C ), typ(C, E, T). 

typ(C, app(E, E ‘ ), bot ) typ(C, E, bot), typ(C, E 1 , T ). 

typ(C,mu(a.E),T ) a # C, typ([cb(a,T)\C], E, bot). 

The following query illustrates the typechecking for the term \x.pa.(x (A y.[ot]y)) 
whose principal type corresponds to the classical double negation law. 

? typ(\], lam(x.mu(a.app(var(x), lam(y.pass(a, var( y)))))), T). 

T = arr(arr(arr(T 162 ,bot), bot), T 162 ) 
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Capture-avoiding substitution can be extended to A/i-terms easily. For re- 
placement, we show the interesting cases for continuation applications and p- 
abstractions: 

func repl(exp, exp, con) = exp. 

repl(pass(A, E 1 ), E, A) = pass(A, app(repl(E' , E, A), E)). 
repl(pass(B , E 1 ), E, A) = pass(B, repl(E E, A)) A # B. 
repl(mu(b.E'), E , A) = mu(b.repl(E ' , E, A)) b # {A, E ). 

This first-order NAS encoding is quite different in flavor from a HOAS encod- 
ing of the A/i-calculus due to Abel [1], There, fxa.t is encoded as mu( A r a n . r f n ), 
where mu : (( tm A — > nam) — > nam ) — » tm A , and tm A is the type of terms 
of type A and nam the type of named terms. Continuations are encoded as 
variables r a n : (tm A -4 nam ) and named terms [a\t are encoded as applica- 
tions r aC r t~ [ . While elegant, this third-order encoding is semantically complex, 
making proofs of interesting properties difficult. 

5 Example: The 7T-Calculus 

The 7T-calculus is a calculus of concurrent, mobile processes. Its syntax (following 
Milner, Parrow, and Walker [14]) is described by the grammar rules shown in 
Figure 3. The symbols x,y, . . . are channel names. The inactive process 0 is inert. 
The r.p process performs a silent action r and then does p. Parallel composition 
is denoted p\q and nondeterministic choice by p + q. The process x(y).p inputs a 
channel name from x, binds it to y, and then does p. The process xy.p outputs 
y to x and then does p. The match operator [x = y\p is p provided x = y, but 
is inactive if x yf y. The restriction operator ( y)p restricts y to p. Parenthesized 
names (e.g. y in x(y).p and (y)p) are binding, and fn(p), bn(p) and n(p) denote 
the sets of free, bound, and all names occurring in p. Capture-avoiding renaming 
is written t{x/y }. 

Milner et al.’s original operational semantics (shown in Figure 3, symmetric 
cases omitted) is a labeled transition system with relation p A q indicating u p 



Process terms p ::= 0 | r.p \ p\q \ p + q \ x(y).p \ xy.p \ [ x = y\p \ (x)p 
Actions a ::= r | x(y ) | xy \ x(y) 



T 




p + q-^Ap' 



x(w') , x(w) , 

p — >p q — > q 
p\q («0(pV) 



p‘ bn(a) fi fn(q) = 0 p AA, p 1 q q 1 
P\q p‘\q p\q -A p‘\q‘{y/z} 

w $ fn((z)p) v-^Av’ 



xy.p AA, p x(z).p p{w/z} [x = x]p p' 
p-^Ap 1 y<£ n(a ) P p‘ V A x w $ fn((y)p) 
( y)p ~^A ( y)p‘ ( y)p p'{w/y} 



Fig. 3. The 7r-calculus 
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func renjp{proc, chan, chan ) = proc. (* definition omitted *) 

pred safe(act,pr). (* tests bn(A ) n fn(P) = 0 *) 

safe{taujx, P). 

saf e{f out ja{X, Y),P). 

sa fe(bout ja(X, Y), P ) Y # P. 

sa}e{mja{X, Y), P) Y # P. 

pred step(pr,act,pr). (* encodes p p' *) 

step(tau(P),tau-a, P). 

step{par{P, Q), A,par{P', Q)) step{P, A, P') , safe{A, Q). 

step{par{P,Q),tau^a,par{P',Q")) step{P, fout^a{X,Y), P 1 ), 

step{Q, mja{X, Z ), Q'), 

Q" = renjp{Q',Y,Z). 

step{sum{P, Q), A, P') step{P, A, P'). 

step{out{X, Y, P), f out ja(X, Y), P). 

step{in{X,z.P),in.a{X,W), P') W # z.P, P' = ren.p{P,W, z). 

step(match(X,X, P), A, P') step(P, A, P'). 

step(par(P, Q), tau^a, res(z.par(P', Q '))) step(P, boutja(X, z), P‘), 

step(Q, m-a(X, z ), Q‘). 

step(res(y.P), A, res(y.P ')) y # A, step(P, A, P‘). 

step(res(y.P), bout -a(X, W), P") step(P, foutja(X, y), P 1 ), y # X, 

W # y.P, P" = renjp(P', W, y). 



Fig. 4. 7r-calculus transitions in aProlog 



steps to q by performing action a”. Actions r, xy, x(y), x(y) are referred to as 
silent , free output, input, and bound output actions respectively; the first two are 
called free and the second two are called bound actions. For an action a, n(a ) is 
the set of all names appearing in a, and bn(a ) is empty if a is a free action and 
is {y} if a is a bound action x(y ) or x{y). Processes and actions can be encoded 
using the following syntax: 

chan: name_type. proc : type. ina : proc. tau : proc — » proc. 

par, sum : {proc, proc) — » proc. in : {chan, ( chan)proc ) — > proc. 

out, match : {chan, chan, proc) — > proc. res : {{chan)proc) — ¥ proc. 
act : type. tau.a : act. in_a, fout.a, bout-a : {chan, chan) — > act. 

Much of the complexity of the rules is due to the need to handle scope ex- 
trusion, which occurs when restricted names “escape” their scope because of 
communication. In {{x)ax.p)\{a{z).z{x).0) — (a: / )(p|a; / (a:).0)), for example, it is 
necessary to “freshen” x to x‘ in order to avoid capturing the free x in a{z).z{x). 0. 
Bound output actions are used to lift the scope of an escaping name out to the 
point where it is received. The rules can be translated directly into aProlog (see 
Figure 4). The function renjp{P, Y, X) performing capture-avoiding renaming is 
not shown, but easy to define. 
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We can check that this implementation of the operational semantics produces 
correct answers for the following queries: 

? step(res(x.par(res(y.out(x, y, ina )), in(x, z.out(z, x, ina)))), A, P). 

A = tau.a,P = res(y58 .res(z 643 .par (ina, out (z 643 , y 58 , ina)))) 

? step{res{x.out(x,y,ina)),A,P). 

No. 

This aProlog session shows that (x)((y)xy . 0 | x(y).yx. 0) —^4 ( x)(y)(0 \ yx. 0), 
but (x)(x(y). 0) cannot make any transition. Moreover, the answer to the first 
query is unique (up to renaming). 

Rbckl [20] and Gabbay [6] have also considered encodings of the 7r-calculus 
using nominal abstract syntax. Rockl considered only modeling the syntax of 
terms up to a-equivalence using swapping, whereas Gabbay went further, en- 
coding transitions and the bisimulation relation and proving basic properties 
thereof. By [6, Tlrm 4.5], Gabbay’s version of the 7r-calculus is equivalent to our 
conventional representation. In fact, Gabbay’s presentation is a bit simpler and 
easier to reason about, but we have chosen Milner et al.’s original presentation 
to emphasize that informal “paper” presentations (even for fairly complicated 
calculi) can be translated directly to aProlog programs. 



6 Concluding Remarks 

6.1 Related Work 

FreshML [22]: an extension of the ML programming language with Gabbay- 
Pitts names, name-binding with pattern matching, and fresh name generation. 
aProlog is related to FreshML in many ways, and it is fair to say that aProlog is 
to logic programming what FreshML is to functional programming. We believe 
however that the differences between FreshML and aProlog are more than cos- 
metic. aProlog lends itself to a declarative style of nameful programming which 
is refreshingly close to informal declarative presentations of operational seman- 
tics, type systems and logics, in contrast to FreshML which remains procedural 
(and effectful) at heart. There are also major differences from a technical point of 
view: in FreshML much research went into designing an expressive type-system, 
while the problems we face in aProlog concern the design of an efficient proof 
search procedure (see [26]). 

Qu-Prolog [24]: an extension of Prolog with built-in names, binding, and ex- 
plicit capture-avoiding substitutions and unification up to both a-equivalence 
and substitution evaluation. Qu-Prolog includes “not free in” constraints corre- 
sponding to our freshness constraints. Nevertheless, there are significant differ- 
ences; aProlog is not a reinvention of Qu-Prolog. First, aProlog is a strongly 
typed polymorphic language, in contrast to Qu-Prolog, which is untyped in the 
Prolog tradition. Second, aProlog is based on a simpler unification algorithm 
that unifies up to a-equivalence but not up to substitution. Finally, Qu-Prolog 
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lacks a logical semantics, and because of its internalized treatment of capture- 
avoiding substitution, developing one would likely be difficult. In contrast, nom- 
inal logic [19] provides a solid semantic foundation for aProlog. 

Logic Programming with Binding Algebras: Hamana [9] has formalized a 
logic programming language based on Fiore, Turi, and Plotkin’s binding alge- 
bras [5]. No implementation of this language appears to be available. However, 
since binding algebras are a formalization of HO AS, we believe that this approach 
will also share the semantic complexity of HOAS . 

L\: Miller [12] discovered a restricted form of higlrer-order logic programming 
called L\ in which unification is efficiently decidable and HOAS encodings are 
possible, but built-in capture-avoiding substitution is not available. There are 
several interesting parallels between L\ and aProlog (and nominal unification 
and L\ unification); relating these languages is future work. 

Delphin: Schurmann et al. [21] are developing a functional programming lan- 
guage called Delphin which supports advanced meta-programming with recur- 
sion over lriglrer-order abstract syntax. This approach seems very powerful, but 
also very complex because it is based on HOAS. 



6.2 Status and Future Work 

We have implemented an interpreter for aProlog based on nominal unification 
as outlined in this paper, along with many additional example programs, such 
as translation to a small typed assembly language, evaluation for a core object 
calculus, and modeling a cryptographic authentication protocol. The implemen- 
tation is available online 1 . Some additional applications of interest, such as type 
inference for a small ML-like language and translations from regular expressions 
to finite automata, do not work properly because of aProlog’s current incom- 
plete implementation. Therefore we are very interested in developing techniques 
for equivariant unification and resolution. 

Following Miller et al. [13], we have formulated a uniform proof theoretic 
semantics for nominal hereditary Harrop formulas based on a sequent calculus 
for nominal logic [7]. A more traditional model-theoretic semantics is in devel- 
opment. We also plan to develop mode and determinism analyses for aProlog. 
Another interesting direction is the possibility of integrating aProlog’s constraint 
domain into an existing constraint logic programming system. 

One deficiency of aProlog relative to HOAS systems and Qu-Prolog is that 
capture-avoiding substitution is not built-in, but must be written by hand when 
needed. We are currently experimenting with an operation •{•/•} : (a,/3,/3) — > a, 
such that t{u/v} denotes the result of replacing the term v with u everywhere 
in t , renaming abstractions to avoid capture. Thus, subst(t,u,x ) for A or A/i- 
terms can be written as t{u/uar(x)}, and ren_p(p, a, b) as p{ a/b}. Currently 
there are ad hoc restrictions such as that t and v must be ground when tfu/vf 
is evaluated. Also, this does not help with unusual substitution-like operations 

1 http : //www. cs . Cornell . edu/People/ j cheney/ aprolog/ 
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such as A/i-calculus replacement. Developing a logical account of substitution in 
nominal logic and aProlog is an important area for future work. 



6.3 Summary 

Though still a work in progress, aProlog shows great promise. Although aProlog 
is not the first language to include special constructs for dealing with variable 
binding, ccProlog allows programming much closer to informal “paper” defini- 
tions than any other extant system. We have given several examples of languages 
that can be defined both declaratively and concisely in aProlog. We have also 
described the operational semantics for core aProlog, which is sound with re- 
spect to nominal logic, but complete only for a class of well-behaved programs. 
Additional work is needed to develop practical techniques for equivariant unifi- 
cation necessary for complete nominal resolution, and to develop static analyses 
and other forms of reasoning about aProlog programs. More broadly, we view 
aProlog as a modest first step toward a nominal logical framework for reason- 
ing about programming languages, logics, and type systems encoded in nominal 
abstract syntax. 
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Abstract. Logic Programming has been advocated as a language for system 
specification, especially for those involving logical behaviours, rules and knowl- 
edge. However, modeling problems involving negation, which is quite natural in 
many cases, is somewhat limited if Prolog is used as the specification / imple- 
mentation language. These restrictions are not related to theory viewpoint, where 
users can find many different models with their respective semantics; they con- 
cern practical implementation issues. The negation capabilities supported by cur- 
rent Prolog systems are rather constrained, and there is no a correct and complete 
implementation available. In this paper, we refine and propose some extensions 
to the classical method of constructive negation, providing the complete theoret- 
ical algorithm. Furthermore, we also discuss implementation issues providing a 
preliminary implementation and also an optimized one to negate predicates with 
a finite number of solutions. 

Keywords: Constructive Negation, Negation in Logic Programming, Constraint 
Logic Programming, Implementations of Logic Programming, Optimization. 



1 Introduction 

From its very beginning Logic Programming has been advocated to be both a program- 
ming language and a specification language. It is natural to use Logic Programming for 
specifying/programming systems involving logical behaviours, rules and knowledge. 
However, this idea has a severe limitation: the use of negation. Negation is probably the 
most significant aspect of logic that was not included from the outset. This is due to the 
fact that dealing with negation involves significant additional complexity. Nevertheless, 
the use of negation is very natural and plays an important role in many cases, for in- 
stance, constraints management in databases, program composition, manipulation and 
transformation, default reasoning, natural language processing, etc. 

Although this restriction cannot be perceived from the theoretical point of view 
(because there are many ways to understand and incorporate negation into Logic Pro- 
gramming), the problems really start at the semantic level, where the different propos- 
als (negation as failure -naf-, stable models, well-founded semantics, explicit negation, 

* This work was partly supported by the Spanish MCYT project TIC2003-01036. 

B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 284-298, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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etc.) differ not only as to expressiveness but also as to semantics. However, the nega- 
tion techniques supported by current Prolog 1 compilers are rather limited, restricted to 
negation as failure under Fitting/Kunen semantics [13] (sound only under some circum- 
stances usually not checked by compilers) which is a built-in or library in most Prolog 
compilers (Quintus, SICStus, Ciao, BinProlog, etc.), and the “delay technique” (apply- 
ing negation as failure only when the variables of the negated goal become ground, 
which is sound but incomplete due to the possibility of floundering), which is present 
in Nu-Prolog, Godel, and Prolog systems that implement delays (most of the above). 

Of all the proposals, constructive negation [8,9] (that we will call classical con- 
structive negation) is probably the most promising because it has been proven to be 
sound and complete, and its semantics is fully compatible with Prolog’s. Constructive 
negation was, in fact, announced in early versions of the Eclipse Prolog compiler, but 
was removed from the latest releases. The reasons seem to be related to some techni- 
cal problems with the use of coroutining (risk of floundering) and the management of 
constrained solutions. We are trying to fill a long time open gap in this area (remember 
that the original papers are from late 80s) facing the problem of providing a correct 
implementation. 

The goal of this paper is to give an algorithmic description of constructive negation, 
i.e. explicitly stating the details needed for an implementation. We also intend to dis- 
cuss the pragmatic ideas needed to provide a concrete and real implementation. We are 
combining several different techniques: implementation of disequality constraint, pro- 
gram transformation, efficient management of constraints on the Herbrand universe, etc. 
While many of them are relatively easy to understand (and the main inspiration are, of 
course, in papers on theoretical aspects of constructive negation including Chan’s ones) 
the main novelty of this work is the way we combine by reformulating constructive 
negation aspect in an implementation oriented way. In fact, results for a concrete imple- 
mentation extending the Ciao Prolog compiler are presented. Due to space limitations, 
we assume some familiarity with constructive negation techniques [8, 9]. 

On the side of related work, unfortunately we cannot compare our work with any 
existing implementation of classical constructive negation in Prolog (even with imple- 
mentations of other negation techniques, like intensional negation [3, 6, 16] or negation 
as instantiation [20] where many papers discuss the theoretical aspects but not imple- 
mentation details) because we have not found in the literature any reported practical 
realization. However, there are some very interesting experiences: notably XSB proto- 
types implementing well-founded semantics ([1]). Less interesting seem to be the im- 
plementation of constructive negation reported in [5] because of the severe limitations 
in source programs (they cannot contain free variables in clauses) and the prototype 
sketched in [2] where a botton-up computation of literal answers is discussed (no exe- 
cution times are reported but it is easy to deduce inneficiency both in terms of time and 
memory). 

The remainder of the paper is organized as follows. Section 2 details our construc- 
tive negation algorithm. It explains how to obtain the frontier of a goal (Section 2.1), 
how to prepare the goal for negation (Section 2.2) and, finally, how to negate the goal 

1 We understand Prolog as depth-first, left to right implementation of SLD resolution for Horn 
clause programs, ignoring, in principle, side effects, cuts, etc. 
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(Section 2.3). Section 3 discusses implementation issues: code expansion (Section 3.1), 
required disequality constraints (Section 3.2) and optimizations (Section 3.3). Section 4 
provides some experimental results and Section 5 talks about a variant of our implemen- 
tation for negating goals that have a finite number of solutions. Finally, we conclude and 
outline some future work in Section 6. 

2 Constructive Negation 

Most of the papers addressing constructive negation deal with semantic aspects. In fact, 
only the original papers by Chan gave some hints about a possible implementation based 
on coroutining, but the technique was only outlined. When we tried to reconstruct this 
implementation we came across several problems, including the management of con- 
strained answers and floundering (which appears to be the main reason why construc- 
tive negation was removed from recent versions of Eclipse). It is our belief that these 
problems cannot be easily and efficiently overcome. Therefore, we decided to design 
an implementation from scratch. One of our additional requirements is that we want 
to use a standard Prolog implementation to enable that Prolog programs with negation 
could reuse libraries and existing Prolog code. Additionaly, we want to maintain the 
efficiency of these Prolog programs, at least for the part that does not use negation. In 
this sense we will avoid implementation-level manipulations that would delay simple 
programs without negations. 

We start with the definition of a frontier and how it can be managed to negate the 
respective formula. 

2.1 Frontier 

Firstly, we present Chan’s definition of frontier (we actually owe the formal definition 
to Stuckey [23]). 

Definition 1. Frontier 

A frontier of a goal G is the disjunction of a finite set of nodes in the derivation tree 
such that every derivation of G is either finitely failed or passes through exactly one 
frontier node. 

What is missing is a method to generate the frontier. So far we have used the sim- 
plest possible frontier: the frontier of depth 1 obtained by taking all the possible single 
SLD resolution steps. This can be done by a simple inspection of the clauses of the 
program 2 . Additionally, built-in based goals receive a special treatment (moving con- 
junctions into disjunctions, disjunctions into conjunction, eliminating double negations, 
etc.) 

Definition 2. Depth-one frontier 

- If G= {Gi^Gf) then Front ier{G) = Front ier{G\) V Frontier^Gf). 

2 Nevertheless, we plan to generate the frontier in a more efficient way by using abstract inter- 
pretation over the input program for detecting the degree of evaluation of a term that will be 
necesary at execution time. 
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- IfG = (Gi , G 2 ) f/ten Frontier(G) = Frontier{G \ ) A FrontieriGf) and then we have 
to apply DeMorgan ’s distributive property to retain the disjunction of conjunctions 
format. 

- If G = p(X) and predicate p/m is defined by N clauses: p(X ) : —C\ . 

p(X N ) : -C N . 



The frontier of the goal has the format: Frontier(G) = C 1 V C 2 V . . . V Cn, where 
each Ci is the union of the conjunction of subgoals C\ plus the equalities that are 
needed to unify the variables ofX and the respective terms ofX'. 

The definition is an easy adaptation of Chan’s one, but it is also a simple example of 
the way we attack the problem, reformulating yet defined concepts in an implementation 
oriented way. 

Consider, for instance, the following code: 
odd ( s ( 0 ) ) . 

odd(s(s(X))) odd(X). 

The frontier for the goal odd(Y) is as follows: 

Frontier{odd{Y)) = {(F = s(0)) V (F = s(s(X)) A odd(X))} 

To get the negation of G it suffices to negate the frontier formula. This is done 
by negating each component of the disjunction of all implied clauses (that form the 
frontier) and combining the results. That is, G = — 'Frontier [(G) = ~C 1 A ... A Cn- 
Therefore, the solutions of cneg(G) are the result of the combination (conjunction) 
of one solution of each -> C,. So, we are going to explain how to negate a single con- 
junction Cj . This is done in two phases: Preparation and Negation of the formula. 

2.2 Preparation 

Before negating a conjunction obtained from the frontier, we have to simplify, organize, 
and normalize this conjunction. The basic ideas are present in [8] in a rather obscure 
way. In Chan’s papers, and then in Stuckey’s one, it is simply stated that the conjunction 
is negated using logic standard techniques. Of course, it is true but it is not so easy in 
the middle of a Prolog computation because we have not access to the whole formula 
or predicate we are executing 3 . 

- Simplification of the conjunction. If one of the terms of C, is trivially equivalent 
to true (e.g. X = X), we can eliminate this term from C Symmetrically, if one of the 
terms is trivially fail (e.g. X yfi X ), we can simplify C, = fail. The simplification 
phase can be carried out during the generation of frontier terms. 

- Organization of the conjunction. Three groups are created containing the com- 
ponents of Cj, which are divided into equalities (7), disequalities (D), and other 
subgoals ( R ). Then, we get C, = 7 f\D t\R. 

3 Unless we use metaprogramming techniques that we try to avoid for efficiency reasons. 
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- Normalization of the conjunction. Let us classify the variables in the formula. 
The set of variables of the goal, G, is called GoalVars. The set of free variables of 
R is called RelVars. 

• Elimination of redundant variables and equalities. If /, = X = Y, where 
Y GoalVars , then we now have the formula (7i A . . . A i A /,-+i A . . . A Ini A 
D/\R)o, where a = {Y /X}, i.e. the variable Y is substituted by X in the entire 
formula. 

• Elimination of irrelevant disequalities. ImpVars is the set of variables of 
GoalVars and the variables that appear in 7. The disequalities Z), that contain 
any variable that was neither in ImpVars nor in RelVars are irrelevant and 
should be eliminated. 



2.3 Negation of the Formula 

It is not feasible, to get all solutions of C, and to negate their disjunction because C, 
can have an infinite number of solutions. So, we have to use the classical constructive 
negation algorithm. 

We consider that ExpVars is the set of variables of R that are not in ImpVars, i.e. 
RelVars, except the variables of 7 in the normalized formula. 

First step: Division of the formula 

C, is divided into. G/ = / A E)[ mp A Rjmp A Dexp A Rexp 

where D exp are the disequalities in D with variables in ExpVars and D lmp are the other 
disequalities, R exp are the goals of R with variables in ExpVars and R, mp are the other 
goals, and 7 are the equalities. 

Therefore, the constructive negation of the divided formula is: 

G< = ~ / V (/ A - Dimp) V (/ A Dj mp A ~ 1 Rimp ) V I A Dimp A Rimp A “ 1 {Dexp A Rexp)) 

It is not possible to separate D exp and R exp because they contain free variables and 
they cannot be negated separately. The answers of the negations will be the answers of 
the negation of the equalities, the answers of the negation of the disequalities without 
free variables, the answers of the negation of the subgoals without free variables and 
the answers of the negation of the other subgoals of the conjunctions (the ones with free 
variables). Each of them will be obtained as follows: 

Second step: Negation of subformulas 



- Negation of 7. We have 7 = I\ A . . . Aim = 3 Z\ X\ = t\ A . . . A 3 Z NI X m = tm 
where Z, are the variables of the equality /, that are not included in GoalVars (i.e. 
that are not quantified and are therefore free variables). When we negate this con- 
junction of equalities we get the constraint V Z| Aj ^ A V ... V V Zni Xni ^ tm = 

t\ *"’ Ini 

V^=i V ZjXj jx t; This constraint is the first answer of the negation of C, that contains 
Nl components. 

- Negation of Di, np . If we have No imp disequalities Dj mp = D\ A . . . ADn d where 

Di = V Wi 3 Z[ Yj Sj where Yj is a variable of ImpVars, Si is a term without 
variables in ExpVars, VV, are universally quantified variables that are neither in the 
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equalities 4 , nor in the other goals of R because otherwise R would be a disequality 
of D exp . Then we will get No imp new solutions with the format: 

7a^ D\ 

I f\D\ A -i Di 

IADi A . . . AD Nd _i A *>%, 

‘-'imp ‘'imp 

where - D, = 3 W , Y ) = .s,- . The negation of a universal quantification turns into 
an existential quantification and the quantification of free variables of Z, gets lost, 
because the variables are unified with the evaluation of the equalities of 7. Then, we 
will get No jmp new answers. 

- Negation of R unp . If we have N Pimp subgoals R nnp = R\ A . . -ARn r . ■ Then we will 
get new answers from each of the conjunctions: 

I A Dimp A R | 

7 A D lmp A R ] A^ R 2 

T RD im pAR\ A...ARn r , -\A^Rn r . 

' ‘Hmp lK imp 

where -1 Rj = cneg(Ri). Constructive negation is again applied over Rj recursively 
using this operational semantics. 

- Negation of D exp A R exp • This conjunction cannot be separated because of the nega- 
tion of 3 Vexp D^p A Rexp , where V exp gives universal quantifications: 

V V exp cneg(D exp A Rexp)- The entire constructive negation algorithm must be ap- 
plied again. Notice the recursive application of constructive negation. However, the 
previous steps could have generated an answer for the original negated goal. Of 
course it is possible to produce infinitely many answer to a negated goal. 

Note that the new set GoalVars is the former set ImpVars. Variables of V exp are 
considered as free variables. When solutions of cneg(D e xp A Rexp) are obtained 
some can be rejected: solutions with equalities with variables in V exp . If there is 
a disequality with any of these variables, e.g. V, the variable will be universally 
quantified in the disequality. This is the way to obtain the negation of a goal, but 
there is a detail that was not considered in former approaches and that is necessary 
to get a sound implementation: the existence of universally quantified variables in 
D eX p A R exp by the iterative application of the method. That is, we are really negat- 
ing a subgoal of the form: 3 V ev/7 D exp A R exp - Its negation is V V exp ->( D exp A R exp ) 
and therefore, we will provide the last group of answers that comes from: 

I A Dimp A Rimp AV ^ exp ~~ 1 ( D e xp A Rexp) 

3 Implementation Issues 

Having described the theoretical algorithm, including important details, we now discuss 
important aspects for a practical implementation, including how to compute the frontier 
and manage answer constraints. 

4 There are, of course, no universally quantified variables in an equality. 
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3.1 Code Expansion 

The first issue is how to get the frontier of a goal. It is possible to handle the code of 
clauses during the execution thanks to the Ciao package system [7], which allows the 
code to be expanded at run time. The expansion is implemented in the cneg.pl package 
which is included in the declaration of the module that is going to be expanded (i.e. 
where there are goals that are negations). 

Note that a similar, but less efficient, behaviour can be emulated using metapro- 
gramming facilities, available in most Prolog compilers. 

3.2 Disequality Constraints 

An instrumental step for managing negation is to be able to handle disequalities between 
terms such as t\ / t 2 - The typical Prolog resources for handling these disequalities are 
limited to the built-in predicate /== /2, which needs both terms to be ground because 
it always succeeds in the presence of free variables. It is clear that a variable needs to 
be bound with a disequality to achieve a “constructive” behaviour. Moreover, when an 
equation X = t(Y) is negated, the free variables in the equation must be universally 
quantified, unless affected by a more external quantification, i.e. V Y X ^ t(Y) is the 
correct negation. As we explained in [17], the inclusion of disequalities and constrained 
answers has a very low cost. From the theoretical point of view, it incorporates negative 
normal form constraints (in the form of conjuntion of equalities plus conjunction of dis- 
junctions of possibly universally quantified disequations) instead of simple bindings as 
the decomposition step can produce disjunctions. In the implementation side, attributed 
variables are used which associate a data structure, containing a normal form constraint, 
to any variable: (/\V Z* (Yj s l j) V . . . V V Z" (Y" ^ s")) 

j i 

N. v ✓ 

negative information 

Additionaly, a Prolog predicate =/= /2 has been defined, used to check disequali- 
ties, similarly to explicit unification (=). Each constraint is a disjunction of conjunctions 
of disequalities. A universal quantification in a disequality (e.g., VK X ^ c(T)), is rep- 
resented with a new constructor f A/ 1 (e.g., X =/= c ( f A (Y) ) ). Due to the lack of 
space, we refer the interested reader to check the details in [17], 

3.3 Optimizing the Algorithm and the Implementation 

Our constructive negation algorithm and the implementation techniques admit some ad- 
ditional optimizations that can improve the runtime behaviour of the system. Basically, 
the optimizations rely on the compact representation of information, as well as the early 
detection of successful or failing branches. 

Compact information. In our system, negative information is represented quite com- 
pactly thanks to our constraint normal form, providing fewer solutions from the nega- 
tion of 7 and Di, np . The advantage is twofold. On the one hand constraints contain more 
information and failing branches can be detected earlier (i.e. the search space could 
be smaller). On the other hand, if we ask for all solutions using backtracking, we are 
cutting the search tree by offering all the solutions together in a single answer. For ex- 
ample, we can offer a simple answer for the negation of a predicate p (the code for p is 
skipped because it is no relevant for the example): 
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?- cneg (p (X, Y, Z,W) ) . 

(X=/=0, Y=/=S ( Z ) ) ; (X=/=Y) ; (X=/=Z) ; 

(X=/=W) ; (X=/=s (0) , Z=/ = 0) ? ; 
no 

(which is equivalent to the formula (X 7^ 0 A T ^ 5(Z) ) V X ^ T V X 7^ Z V X ^ W V (X ^ 
5(0) AZ/ 0 ) that can be represented in our constraint normal form and, therefore, 
managed by attributes to the involved variables), instead of returning the six equivalent 
answers upon backtracking: 

?- cneg (p (X, Y, Z,W) ) . 

X=/=0, Y=/=s ( Z ) ? ; 

X=/=Y ? ; 

X=/=Z ? ; 

X=/=W ? ; 

X=/=s (0) , Z=/ = 0 ? ; 
no 

In this case we get the whole disjunction at once instead of getting it by backtraking 
step by step. The generation of compact formulas in the negation of subformulas (see 
above Second step) is used whenever possible (in the negation of 7 and the negation 
of Dj mp ). The negation of Rj mp and the negation of ( D exp V R exp ) can have infinite 
solutions whose disjunction would be impossible to compute. So, for these cases we 
construct incrementally the solutions using backtracking. 

Pruning subgoals. The frontiers generation search tree can be cut with a double action 
over the ground subgoals: removing the subgoals whose failure we are able to detect 
early on, and simplifying the subgoals that can be reduced to true. Suppose we have a 
predicate p / 2 defined as 

p (X , Y) : - greater (X,Y) , q(X,Y,Z), r(Z). 

where q /3 and r j 1 are predicates defined by several clauses with a complex computa- 
tion. To negate the goal p(s( 0 ), 5(5(0))), its frontier is computed: 
Frontier(p(s( 0 ),s(s( 0 )))) = 

Step 1 X = s( 0 ) A Y = 5(5(0)) A greater(X,Y) A q(X,Y,Z) A r(Z) = 

Step 2 great er{s{ 0 ), s{s{ 0 ))) f\q{s{ 0 ),s{s{ 0 )),Z) /\r(/Z) = 

Step 3 fail A 17(5(0), 5(5(0)), Z) A r(Z) = 

Step 4 fail 

The first step is to expand the code of the subgoals of the frontier to the combination 
of the code of all their clauses (disjunction of conjunctions in general but only one 
conjunction in this case because p /2 is defined by one only clause), and the result will 
be a very complicated and hard to check frontier. However, the process is optimized by 
evaluating ground terms (Step 2 ). In this case, gre< 7 fer( 5 ( 0 ), 5 ( 5 ( 0 )) fails and, therefore, 
it is not necessary to continue with the generation of the frontier, because the result is 
reduced to fail (i.e. the negation of ^(5(0), 5(5(0))) will be trivially true). The opposite 
example is a simplification of a successful term in the third step: 
Frontier(p(s(s( 0 )),s( 0 ))) = 

X = 5(5(0)) A Y = 5 ( 0 ) A great er(XJ ) A q(X,Y, Z) A r(Z) = 



Step 1 
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Step 2 greater(s(s(0)),s(0)) A ^(j(j(0)), j(0),Z) A r(Z) = 

Step 3 true Aq(s(s(0)),s(0),Z) A r(Z) = 

Step A q(s(s(0)),s(0),Z) A r(Z) 

Constraint simplification. During the whole process for negating a goal.the frontier 
variables are constrained. In cases where the constraints are satisfiable, they can be elim- 
inated and where the constraints can be reduced to fail, the evaluation can be stopped 
with result true. 

We focus on the negative information of a normal form constraint F: 

F=VtAjVz;(i7^.') 

Firstly, the Prenex form [21 j can be obtained by extracting the universal variables 
with different names to the head of the formula, applying logic rules: 

F = v*ViA;(rj^) 

and using the distributive property (notice that subindexes are different): 

The formula can be separated into subformulas that are simple disjunctions of dise- 
qualities : 

F = A * v* V/(l? ^ 4) =FiA...A F n 

Each single formula A can be evaluated. The first step will be to substitute the 
existentially quantified variables (variables that do not belong to x ) by Skolem constants 
that will keep the equivalence without losing generality: 

Fk = vt\//(T/ a ^ 4) — v*V/(4/ ^ 4m) 

Then it can be transformed into: 

F'k = ^3 x^(\J,(Y k kl ^ 4 u)) = Fe K 
The meaning of Fk is the negation of the meaning of Fei - ; 

Fek = 3 J->(V i^Jskl 7 ^ 4a/)) 

Solving the negations, the result is obtained through simple unifications of the vari- 
ables of x: 

Fek = 3 x/\ ^(Ygki 7 ^ 4a/) = 3 x/\(Y^ kl = s k skl ) 

Therefore, we get the truth value of Fk from the negation of the value of Fek and, 
finally, the value of F is the conjunction of the values of all F k . If F succeeds, then 
the constraint is removed because it is redundant and we continue with the negation 
process. If it fails, then the negation directly succeeds. 

4 Experimental Results 

Our prototype is a simple library that is added to the set of libraries of Ciao Prolog. 
Indeed, it is easy to port the library to other Prolog compilers. The only requirement is 
that attributed variables should be available. 

This section reports some experimental results from our prototype implementation. 
First of all, we show the behaviour of the implementation in some simple examples. 

4.1 Examples 

The interesting side of this implementation is that it returns constructive results from 
a negative question. Let us start with a simple example involving predicate boole/ 1. 
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boole (0) . 
boole (1) . 



?- cneg (boole (X) ) . 
X=/=l , X=/=0 ? ; 
no 



Another simple example obtained from [23] gives us the following answers 



p (a,b, c) . 
p (b, a, c) . 
p (c, a,b) . 
proof 1 (X, Y, Z) : - 

X =/= a, Z = c, 
cneg (p (X, Y, Z) ) . 



? - proof 1 (X, Y, Z) . 

Z = c, X=/=b, X=/=a ? ; 
Z = c, Y=/=a, X=/=a ? ; 
no 



[23] contains another example showing how a constructive answer (VT X ^ s(T)) is 
provided for the negation of an undefined goal in Prolog: 

p (X) : - X = s (T) , q (T) . ?- r (X) . 

q (T) : - q (T) . X=/=s ( f A (_A) ) ? 

r (X) : - cneg (p (X) ) . ^ es 

Notice that if we would ask for a second answer, then it will loop according to the 
Prolog resolution. An example with an infinite number of solutions is more interesting. 



positive (0) . 
positive (s (X) ) : - 

positive (X) . 



?- cneg (positive (X) ) . 

X=/=S ( f A (_A) ) , X=/=0 ? ; 

X = S ( A) , 

(_A=/=S ( f A ( B) ) , A=/=0 ) ? ; 

X = s (S ( A) ) , 

(_A— /— S ( f A ( B) ) , A=/=0 ) ? ; 

X = s (s (S ( A) ) ) , 

(_A=/ = s(fA( B) ) , _A=/ = 0) ? 

yes 



4.2 Implementation Measures 



We have firstly measured the execution times in milliseconds for the above examples 
when using negation as failure (naf /l) and constructive negation (cneg/ 1). A in a 
cell means that negation as failure is not applicable. Some goals were executed a number 
of times to get a significant measurement. All of them were made using Ciao Prolog 5 
1.5 on a Pentium II at 350 MHz. The results are shown in Table 1. We have added a 
first column with the runtime of the evaluation of the positive goal that is negated in 
the other columns and a last column with the ratio that measures the speedup of the naf 
technique w.r.t. constructive negation. 

Using naf instead of cneg results in small ratios around 1 .06 on average for ground 
calls with few recursive calls. So, the possible slow-down for constructive negation is 
not so high as we might expect for these examples. Furthermore, the results are rather 
similar. But the same goals with data that involve many recursive calls yield ratios near 
14.69 on average w.r.t naf, increasing exponentially with the number of recursive calls. 

5 The negation system is coded as a library module (“package” [7]), which includes the respec- 
tive syntactic and semantic extensions (i.e. Ciao’s attributed variables). Such extensions apply 
locally within each module which uses this negation library. 
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Table 1 . Runtime comparation 



goals 


Goal 


naf(Goal) 


cneg(Goal) 


ratio 


boole(l) 


2049 


2099 


2069 


0.98 


boole(8) 


2070 


2170 


2590 


1.19 


positive(s(s(s(s(s(s(0)))))) 


2079 


1600 


2159 


1.3 


positive(s(s(s(s(s(0)))))) 


2079 


2139 


2060 


0.96 


greater(s(s(s(0))),s(0)) 


2110 


2099 


2100 


1.00 


greater(s(0),s(s(s(0)))) 


2119 


2129 


2089 


0.98 


average 








1.06 


positive(500000) 


2930 


2949 


41929 


14.21 


positive! 1000000) 


3820 


3689 


81840 


22.18 


greater(500000,500000) 


3200 


3339 


22370 


7.70 


average 








14.69 


boole(X) 


2080 


- 


3109 




positive(X) 


2020 


- 


7189 




greater(s(s(s(0))),X) 


2099 


- 


6990 




greaterfX, Y ) 


7040 


- 


7519 




queens(s(s(0)),Qs) 


6939 


- 


9119 





There are, of course, many goals that cannot be negated using the naf technique and 
that are solved using constructive negation. 

5 Finite Constructive Negation 

The problem with the constructive negation algorithm is of course efficiency. It is the 
price that it has to be paid for a powerful mechanism that negates any kind of goal. 
Thinking of Prolog programs, many goals have a finite number of solutions (we are 
considering also that this can be discovered in finite time, of course). There is a sim- 
plification of the constructive negation algorithm that we use to negate these goals. It 
is very simple in the sense that if we have a goal ~G where the solution of the pos- 
itive subgoal G is a set of n solutions like { S | . .S' 2 - - - ■ . S n } . then we can consider these 
equivalences: ->G = ~^(S\ V S2 V ... V S n ) = 1 A -1S2 A ... A ~^S„) 

Of course, these solutions are a conjunction of unifications (equalities) and dise- 
quality constraints. As described in section 3.2, we know how to handle and negate this 
kind of information. The implementation of the predicate cnegf / 1 is something akin to 

cnegf (Goal) : - 

varset (Goal,GVars) , % Getting variables of the Goal 
setof (GVars,Goal,LValores) , ! , % Getting the solutions 
cneg_solutions (GVars, LValores) . % Negating solutions 
cnegf (_Goal) . % Without solutions, the negation succeeds 

where cneg solutions/ 2 is the predicate that negates the disjunction of conjunctions of 
solutions of the goal that we are negating. It works as described in section 2.3, but it is 
simpler, because here we are only negating equalities and disequalities. 
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We get the set of variables, GVars, of the goal. Goal , that we want to negate (we 
use the predicate varset/2). Then we use the setof/ 3 predicate to get the values of 
the variables of GVars for each solution of Goal. For example, if we want to evalu- 
ate cnegf(boole(X )), then we get varset(boole(X),[X]), setof ([X],boole(X),[[ 0], [1]]) 
(i.e. X = OVX = 1) and cnegsolutions/2 will returnX ^ OAX ^ 1. 

If we have the goal p(X,Y), which, has two solutions X = a, Y = b and X = 
c, Y = d, then, in the evaluation of cnegf(p(X,Y)), we will get varset(p(X ,T), [X,T]), 
setof ([X,Y],p(X,Y), [[a,b\, [c,t/]]) (i.e. (X = a AY = b) V (X = c AY = d)) and 
cnegsolutions/2 will return the four solutions (Xf^a AX ^ c) V (X ^ a A Y ^ d) V (T ^ 
bAX^c)V(Y^bAY^d). 

5.1 Analysis of the Number of Solutions 

The optimization below is very intuitive but, perhaps, the main problem is to detect 
when a goal is going to have a finite number of solutions. To get sound results, we are 
going to use this technique (finite constructive negation) just to negate the goals that, 
we are sure do not have infinite solutions. So, our analysis is conservative. 

We use a combination of two analyses to determine if a goal G can be negated 
with cnegf/2: the non-failure analysis (if G does not fail) and the analysis of upper 
cost [14] (if G has an upper cost inferior to infinite). Both are implemented in the Ciao 
Prolog precompiler [12]. Indeed, finite constructive negation can handle the negation 
of failure that is success. So the finite upper cost analysis is enough in practice. We 
test these analyses at compilation time and then, when possible, we directly execute the 
optimized version of constructive negation at compilation time. 

It is more complicated to check this at execution time although we could provide a 
rough approximation. First, we get a maximun number N of solutions of G (we can use 
the library predicate findnsols/ 4) and then we check the number of solutions that we 
have obtained. If it is less than N, we can assure that G has a finite number of solutions 
and otherwise we do not know. 

5.2 Experimental Results 

We have implemented a predicate cnegf / 1 to negate the disjunction of all the solutions 
of its argument (a goal). The implementation of this predicate takes advantage of back- 
tracking to obtain only the information that we need to get the first answer. Then, if the 
user asks for another answer, the backtracking gets information enough to provide it. 
Accordingly, we avoid the complete evaluation of the negation of all the solutions first 
time round. We negate the subterms only when we need to provide the next solution. In 
this sense, if we have the goal G (where each .S', is the conjunction of Ni equalities or 
disequalities) G = Si V ... V S„ = (5} A ... A Sf 1 ) V ... V (Sj, A ... AS% n ) 
and we then want to obtain ->G, then we have ~^G = ~^(S i V V ... V S„) = (->Si A 
- 1 S 2 A ... A ~<S n ) = ->(S’} A ... A S^ 1 ) A ... A -i(S^ A ... A S% n ) = 

(-.${ V.-.V-iSf 1 )A...A{->S} l V...V->S* n ) = {->S\ A...A^)V...V(^f 1 A...A^f) 
we begin calculating just the first answer of the negation that will be A ... A , 

and the rest will be calculated if necessary using backtracking. 
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Table 2. Runtime comparison of finite constructive negation 



goals 


Goal 


cneg(Goal) 


cnegffGoal) 


ratio 


boole(l) 


821 


831 


822 


1,01 


positive(s(s(s(0)))) 


811 


1351 


860 


1,57 


greater(s(0),s(s(0))) 


772 


1210 


840 


1,44 


positive(s 5UUUUU (0)) 


1564 


8259 


3213 


2,57 


positivefs /3UUUUU (0)) 


2846 


12445 


3255 


3,82 


greater! s 3UUUU (0),s 3UUUU (0)) 


1240 


66758 


30112 


2,21 


boole(X) 


900 


1321 


881 


1,49 


greater(s(s(s(0))),X) 


990 


1113 


1090 


1,02 


queens(s(s(s(0))),Qs) 


1481 


50160 


1402 


35,77 



Let us present a simple example: 



?- member (3, [X,Y,Z] ) . 
X = 3 ? ; 

Y = 3 ? ; 

Z = 3 ? ; 



? - cnegf (member (3 , [X, Y, Z] ) ) . 
X=/=3 , Y=/=3, Z=/=3 ? ; 
no 



no 



We get the symmetric behavior for the negation of the negation of the initial query 

? - cnegf (cnegf (member (3, [X,Y,Z] ) ) ) . 

X = 3 ? ; 

Y = 3 ? ; 

Z = 3 ? ; 

no 



We checked some time results in Table 2. The results are much more significant for 
more complicated goals. Indeed, the more complicated the code of a predicate is, the 
more inefficient its classical constructive negation ( cneg ) is. However, finite construc- 
tive negation (cnegf) is independent of code complexity. Finite constructive negation 
depends on the complexity of the solutions obtained for the positive goal and, of course, 
the number of solutions of this goal. 

6 Conclusion and Future Work 

After running some preliminary experiments with the classical constructive negation 
technique following Chan’s description, we realized that the algorithm needed some 
additional explanations and modifications. 

Having given a detailed specification of the algorithm in a detailed way we proceed 
to provide a real, complete and consistent implementation. To our knowledge it is the 
first reported work of a running implementation of constructive negation in Prolog from 
its definition in 1988. The results we have reported are very encouraging, because we 
have proved that it is possible to extend Prolog with a constructive negation module 
relatively inexpensively and overall without any delay in Prolog programs that are not 
using this negation. Nevertheless, it is quite important to address possible optimizations, 
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and we are working to improve the efficiency of the implementation. These include a 
more accurate selection of the frontier based on the demanded form of argument in the 
vein of [15]). The full version of the paper will provide more details as well as these 
additional optimizations. Another possible future work is to incorporate our algorithm 
at the WAM machine level. 

In any case, we will probably not be able to provide an efficient enough implemen- 
tation of constructive negation, because the algorithm is inherently inefficient. This is 
why we do not intend to use it either for all cases of negation or for negating goals 
directly. 

Our goal is to design and implement a practical negation operator and incorporate it 
into a Prolog compiler. In [17, 18] we systematically studied what we understood to be 
the most interesting existing proposals: negation as failure (naf) [10], use of delays to 
apply naf securely [19], intensional negation [3,4], and constructive negation [8,9, 11, 
22,23]. As none of them can satisfy our requirements of completeness and efficiency, 
we propose to use a combination of these techniques, where the information from static 
program analyzers could be used to reduce the cost of selecting techniques [18]. So, in 
many cases, we avoid the inefficiency of classical constructive negation. However, we 
still need it because it is the only method that is sound and complete for any kind of 
goals. For example, looking at the goals in Table 1, the strategy will obtain all ground 
negations using the naf technique and it would only use classical constructive negation 
for the goals with variables where it is impossible to use naf . Otherwise, the strategy 
will use finite constructive negation ( cnegf ) for the three last goals of Table 2 because 
the positive goals have a finite number of solutions. 

We are testing the implementation and trying to improve the code, and our intention 
is to include it in the next version of Ciao Prolog 6 . 
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Abstract. Hybridization of local search and constraint programming 
techniques for solving Constraint Satisfaction Problems is generally re- 
stricted to some kind of master-slave combinations for specific classes of 
problems. In this paper we propose a theoretical model based on K.R. 
Apt’s chaotic iterations for hybridization of local search and constraint 
propagation. Hybrid resolution can be achieved as the computation of a 
fixpoint of some specific reduction functions. Our framework opens up 
new and finer possibilities for hybridization/combination strategies. We 
also present some combinations of techniques such as tabu search, node 
and bound consistencies. Some experimental results show the interest of 
our model to design such hybridization. 



1 Introduction 

Constraint Satisfaction Problems (CSP) [15] provide a general framework for the 
modeling of many practical applications (planning, scheduling, time tabling,...). 
A CSP is usually defined by a set of variables associated to domains of possible 
values and by a set of constraints. We only consider here CSP over finite domains. 
Constraints can be understood as relations over some variables and therefore, 
solving a CSP consists in finding tuples that belong to each constraint (an as- 
signment of values to the variables that satisfies these constraints). To this end, 
many resolution algorithms have been proposed and we may distinguish at least 
two classes of general methods: 1) complete methods aim at exploring the whole 
search space in order to find all the solutions or to detect that the CSP is not 
consistent. Among these methods, we find methods based on constraint prop- 
agation, one of the most common techniques from constraint programming [4] 
(CP) for solving CSP; and 2) incomplete methods (such as Local Search [1] (LS)) 
mainly rely on the use of heuristics providing a more efficient exploration of 
interesting areas of the search space in order to find some solutions. 

A common idea is to build more efficient and robust algorithms by combining 
several resolution paradigms in order to take advantage of their respective assets 
(e.g., [5] presents an overview of possible uses of LS in CP). The benefit of the 
hybridization LS+CP is well-known and does not have to be proven (see e.g., [8, 
13, 12, 14]). Most of the previous works are either algorithmic approaches which 



B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 299-313, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




300 Eric Monfroy, Frederic Saubion, and Tony Lambert 



define a kind of master-slave combination (e.g., LS to guide the search in CP, 
or CP to reduce interesting area of the search space explored in LS), or ad-hoc 
realizations of systems for specific classes of problems. 

In this paper, we are concerned with a model for hybridization in which local 
search [1] and constraint propagation [4] are broken up into their component 
parts. These basic operators can then be managed at the same level by a single 
mechanism. In this framework, properties concerning solvers (e.g., termination, 
solutions) can be easily expressed and established. This framework also opens 
up new and finer possibilities of combination strategies. 

Our model is based on K.R. Apt’s chaotic iterations [2] which define a math- 
ematical framework for iteration of a finite set of functions over “abstract” do- 
mains with partial ordering. This framework is well-suited for solving CSPs with 
constraint propagation: domains are instantiated with variable domains (possible 
values of variables), and functions with domain reduction functions to remove 
inconsistent (w.r.t. constraints) values of domain variables (reduction functions 
abstract the notion of constraint in this mechanism) . 

Moreover, to get a complete solver (a solver which is always able to determine 
whether a CSP has some solutions), constraint propagation is generally associ- 
ated with a splitting mechanism (a technique such as enumeration or bisection) 
to cut the search space into some smaller search spaces from which one can hope 
to perform more propagation. Propagation and splitting are interleaved until the 
solutions are reached. 

In our model, Local Search [1] is based on 3 notions: samples which are 
particular points or sets of points of the search space, neighborhood that defines 
which samples can be reached from a specific sample, and a fitness function that 
defines the “quality” of a sample. Then, LS explores the search space by moving 
from sample to sample guided by the fitness function in order to reach a local 
optimum. 

For our purpose, we introduce in our model the notion of sCSP (sampled 
CSP) which is an extension of CSP with a path (list) of samples (generally points 
of the search space). We also integrate the splitting as some reduction functions. 
This way, the “abstract” domains of chaotic iteration are instantiated with union 
of sCSPs. Usual domain reduction functions (used for constraint propagation) 
are extended to fit this new domain. Some new functions (the LS functions) 
are also introduced to jump from samples to samples: these functions have the 
sufficient properties required to be used in the chaotic iteration algorithm. 

Thus, in the chaotic iteration framework, constraint propagation functions, 
local search moves, and splitting functions are considered at the same level and 
all apply to unions of sCSPs. Since interleaving and order of applications of these 
functions are totally free, this framework enables one to design finer strategies for 
hybridization than the usual master-slave combinations. Moreover, termination 
of the realized solvers is straight forward, i.e. , fixpoint of the reduction functions. 

In order to illustrate our framework, we realized some hybrid solvers using 
some well-known techniques and strategies such as tabu search (for LS), node 
and arc consistencies (for propagation), and bisection functions (for splitting). 
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We obtained some experimental results that show the interest of combination 
strategies compared to a single use of these methods. Moreover, the combination 
strategies can easily be designed and specified, and their properties can be proven 
in our model. 

This paper is organized as follows. Section 2 describes constraint propaga- 
tion and local search in terms of basic operators. We present our framework 
for hybridization in Section 3. Some experiments are given in Section 4 before 
concluding in Section 5. 

2 Constraint Satisfaction Problems 

In this section we recall basic notions related to Constraint Satisfaction Prob- 
lems (CSP) [15] together with their resolution principles. Complete resolution 
is presented using the theoretical model developed by K.R. Apt [2,3]. We then 
briefly describe the main lines of a local search process. 

A CSP is a tuple (X,D,C) where X = {x\, - ■ ■ ,x n } is a set of variables 
taking their values in their respective domains D = {Pi, • • • , D n }. A constraint 
c £ C is a relation c C D\ x • • • x D n 1 . In order to simplify notations, D will 
also denote the Cartesian product of D; and C the union of its constraints. A 
tuple d £ D is a solution of a CSP ( X , P, C) if and only if Vc £ C, d £ c. 

2.1 Solving a CSP with Complete Resolution Techniques 

As mentioned in introduction, complete resolution methods are mainly based 
on a systematic exploration of the search space, which corresponds obviously 
to the set of possible tuples. To avoid the combinatorial grow up of this explo- 
ration, these methods use particular heuristics to prune the search space. The 
most popular of these techniques (i.e., constraint propagation) is based on lo- 
cal consistency properties. A local consistency (e.g., [9, 11]) is a property of the 
constraints which allows the search mechanisms to delete values from variables 
domains which cannot lead to solutions. We may mention node consistency and 
arc consistency [10] as famous examples of local consistencies. Complete search 
algorithms use constraint propagation techniques and splitting. Constraint prop- 
agation consists in examining a subset (usually a single constraint) C' of C, 
deleting some inconsistent values (from a local consistency point of view) of the 
domains of variables appearing in C' and to propagate this domain reduction to 
domains of variables appearing in C\C . When no more propagation is possible 
and the solutions are not reached, the CSP is split into sub-CSPs (generally, the 
domain of a variable is split into two sub-domains, leading to two sub-CSPs) on 
which propagation is applied again, and so on until the solutions are reached. 

K.R. Apt proposed in [2, 3] a general theoretical framework for modeling 
such reduction operators. In this context, domain reduction corresponds to the 

1 Note that for sake of simplicity, we consider that each constraint is over all the 
variables xi, ..., x„. However, one can consider constraints over some of the x t . 
Then, the notion of scheme [2, 3] can be used to denote sequences of variables. 
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computation of a fixpoint of a set of functions over a partially ordered set. These 
functions, called reduction functions , abstract the notion of constraint. 

The computation of the least common fixpoint of a set of functions F is 
achieved by the following algorithm: 

GI: Generic Iteration Algorithm 

d :=_L; 

G:=F; 

While G ^ 0 do 

choose g G G; 

G:=G- { g }; 

G:=GU update(G, g, d)\ 
d := g(d); 
endwhile 

where G is the current set of functions still to be applied (G C F), d is a partially 
ordered set (the domains in case of CSP), and for all G, g, d the set of functions 
update(G , g , d) from F is such that: 

— {/ G F- G | f(d) = dAf(g(d)) ± g(d)} C update(G , g , d) . 

— g(d) = d implies that update(G , g, d) = 0. 

— g(g(d)) ^ g(d) implies that g € update(G, g, d) 

Suppose that all functions in F are inflationary (a; G f(x) for all x) and 
monotonic (x G y implies f(x) G f(y) for all x, y) and that ( D , G) is finite. 
Then, every execution of the GI algorithm terminates and computes in d the 
least common fixpoint of the functions from F (see [2]). 

Note that in the following we consider only finite partial orderings. Constraint 
propagation is now achieved by instantiating the GI algorithm: 

— the C ordering is instantiated by D, the usual set inclusion, 

— d :=T corresponds to d := D\ x . . . x D n , the Cartesian product of the 
domains of the variables from the CSP, 

— F is a set of monotonic and inflationary functions (called domain reduction 
functions ) which abstract the constraints to reduce domains of variables. For 
example, one of the domain reduction functions to reduce Boolean variables 
using a and(X,Y, Z) 2 constraint is defined by: if the domain of Z is {1}, 
then the domains of X and Y must be reduced to {1}. 

The result is the smallest box (i.e., Cartesian product of domains) w.r.t. the 
given domain reduction functions that contains the solutions of the CSP. 

At this point, in order to get the solutions of the CSP, one has to explore 
the reduced domains by enumeration or splitting techniques (and then, again, 
propagation, and so on). This usually implies an algorithmic process interleaving 
splitting and propagation phases. However, in the following, we will integrate 
splitting as a reduction function inside the GI algorithm, and we will extend the 
notion of CSP to sampled CSP on which an other type of reduction functions 
will be applied to mimic basic operations of local search algorithms. 

and(X, Y, Z) represents the Boolean relation X A Y = Z. 



2 
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2.2 Solving CSP with Local Search 

Given an optimization problem (which can be minimizing the number of vio- 
lated constraints and thus trying to find a solution of the CSP), local search 
techniques [1] aim at exploring the search space, moving from a configuration to 
one of its neighbors. These moves are guided by a fitness function which eval- 
uates the benefit of such a move in order to reach a local optimum. We will 
generalize the definition of local search in next sections. 

For the resolution of a CSP (X, D, C), the search space can be usually defined 
as the set of possible tuples of D = D\ x • • • x D n and the neighborhood is a 
mapping A f : D — » 2 D . This neighborhood function defines indeed the possible 
moves from a configuration (a tuple) to one of its neighbors and therefore fully 
defines the exploration landscape. The fitness (or evaluation) function eval is 
related to the notion of solution and can be defined as the number of constraints 
c such that t c (t being a tuple from D). 

In this case, the problem to solve is indeed a minimization problem. Given a 
configuration d £ D, two basic strategies can be identified in order to continue 
the exploration of D : 

— intensification: choose d' £ Af(d) such that eval(d') < eval(d). 

— diversification: choose any other neighbor d' . 

The intensification process only performs improving moves while diversification 
strategy allows the process to move to a worst neighbor w.r.t. the eval function. 
Any local search algorithm is based on the management of these basic heuristics 
by introducing specific control features. Therefore, a local search algorithm can 
be considered as a sequence of moves on a structure ordered according to the 
evaluation function. 

3 A Uniform Computational Framework 

From these different CSP resolution approaches, our aim is to integrate the var- 
ious involved computation processes in a uniform description framework. The 
purpose of this section is to instantiate the general computation scheme pre- 
sented in Section 2.1. 

Our idea is to extend the set of usual functions used in the generic iteration 
algorithm with splitting operators and local search strategies. Then, these search 
methods can be viewed as the computation of a hxpoint of a set of functions on 
an ordered set. Therefore, the first step of our work consists in defining the main 
structure. 

3.1 Sampling the Search Space 

As we have seen, domain reduction and splitting operate on domains of val- 
ues while local search acts on a different structure, which usually corresponds 
to points of the search space. Here, we propose a more general and abstract 
definition based on the notion of sample. 
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Definition 1 (Sample). Given a CSP (X,D,C), we define a sample function 
£ : D — ■> 2 D . By extension, e(D) denotes the set {e(d)| d G Dj. 

Generally, e(d) is restricted to d and e(D) = D , but it can also be a scatter 
of tuples around d , an approximation or a box covering d (e.g., for continu- 
ous domains). Moreover, it is reasonnable to impose that e(D) contains all the 
solutions. Indeed, the search space D is abstracted by e(D) to be used by LS. 

In this context, a local search can be fully defined by a neighborhood function 
on e(D) and the set of visited samples for each local search path composed by a 
sequence of moves. Given a neighborhood function A f:e(D) — > 2 £ ( D \ we define 
the set of possible local search paths as CS d = 

(J{p= (si,---,Si) e e(D) 1 | V), 1 < j <i- 1, Sj+i G and si G e(D)} 

i> 0 

since the fundamental property of local search relies on its exploration based 
on the neighborhood relation. From a practical point of view, a local search 
is limited to finite paths according to a stop criterion which can be a fixed 
maximum number of iterations or, in our context of CSP resolution, the fact 
that a solution is reached. For this concern, according to Section 2.2, we consider 
an evaluation function eval: s(D) — > IN such that eval(s) represents the number 
of constraints unsatisfied by s and eval(s) is equal to 0 iff s is a solution. We 
denote s < e vai s' the fact that eval(s) < eval(s'). 

Therefore, from a LS point of view, a result is either a search path leading 
to a solution or a search path of a maximum given size. 

Definition 2. We consider an order Q s on CSd defined by: 

(si, Q s (si, . . . , s n ) iff eval(s n ) =0 orn> m. 

Consider pi = ( a,b ), p 2 = (a, c) and p 3 = ( 6 ) three elements of CSd such that 
eval(b) = 0 (i.e., b is a solution). Then, they all correspond to possible results of 
a local search of size 2, and they are equivalent w.r.t. to Definition 2. 

3.2 Computation Structure 

We now instantiate the abstract framework of K.R. Apt described in Section 2.1. 

Definition 3. A sampled CSP (sCSP) is defined by a triple ( D,C,p ), a sample 
function e, and a local search ordering where 

— D — D \, ..., D n 

— Vc G C, c C Di x . . . x D n 

— P G CS D 

Note that, in our definition, the local search path p should be included in the 
box defined by e(D). We denote SCSP the set of sCSP and we define now an 
ordering relation on the structure (SCSP, C). 
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Definition 4. Given two sCSPs ip = ( D,C,p ) and ip' = (D' ,C,p'), 
ip C ip' iff D' C D or ( D ' = D and p p'). 

This relation is extended on 2 SCSP as: 

{<Pi,---,<pk} E {ip\,...,ipi} iff V<pi,(3ipj, (piQipj and flipj, ipj C (pi) 



where i £ [l..k],j £ [1..Z] . 

Note that this ordering on sCSPs could be extended by also considering an order 
on constraints; this would enable constraint simplifications. 

We denote SCSP the set 2 SCSP which constitutes the key set of our com- 
putation structure. We denote crCSP an element of SCSP. The least element 
_L is {(D,C,p)}, i.e., the initial aCSP to be solved. 

3.3 Notion of Solution 

Our framework is dedicated to CSP resolution and therefore we have to define 
precisely the notion of solution w.r.t. the previous computation structure. We 
should note that this notion is clear from each side of the resolution (i.e., com- 
plete and incomplete methods). From the complete resolution point of view, a 
solution of a CSP is a tuple which satisfies all the constraints. From the LS point 
of view, the notion of solution is related to the evaluation function eval which 
defines a solution as an element s of e(D) such that eval(s) = 0. 

Given a sCSP ip = ( D,C,p ), these two points of view induce two sets of 
solutions SoId(iP) = {d £ D|Vc £ C, d £ c} and Solcs D (ip ) = {(si, • • • , s„) £ 
CS d | eval(s n ) = 0}. 

Definition 5. Given a sCSP ip = (D,C,p), the set of solutions of ip is defined 
by: 

Sol (ip) = {( d,C,p)\d £ Sol D (ip) or p £ Sol C s D (tp)} 

This notion is extended to any aCSP T> as Sol ('T) = Sol (ip). 

3.4 Reduction Functions Definitions and Properties 

We have now to define the notion of function on SCSP. Given an element 
\T = {ip\, ■ ■ ■ , ip n } of SCSP , we have to apply functions on T which correspond 
to domain reduction, domain splitting, and local search. These functions may 
operate on various elements of \P, and for each ipi on some of its components. 
We should note that since we consider here finite initial CSPs, our structure is 
a finite partial ordering. 

Definition 6 (Domain reduction function). A domain reduction function 
is a function red on SCSP s.t. for all T' = {ipi, . . . ,ip n } £ SCSP, red(T) = 
andMi £ [1 •••«]: 
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— either ipi = ip[ 

— or ipi = (D,C,p), ip[ = ( D\C,p ') and D D D' and Solo(ipi) = Soloi^D- 

Note that this condition insures that & C red(\P) and that the function 
is inflationary and monotonic on (£CSP,Ef). It allows one to reduce several 
domains of several sCSPs of a aCSP at the same time. From a constraint 
programming point of view, no solution of the initial CSP is lost by a domain 
reduction function. This is also the case for domain splitting as defined below. 

Definition 7 (Domain splitting). A domain splitting function is a function 
sp on SCSP such that for all = {ip 1 , • • • ,ip n } € SCSP: 

a. spifP) = {ip [, . . . , ip' m } with n < m, 

b. Vz £ [l..n], 

• either 3 j £ [l..m] such that ipi = ipj 

• or there exist ip '^ , . . . , ip'j h , G [l..m] such that Sol^ipf) = 

Ufc=l ..h Sol DWj k )- 

c. and , Vj £ [l..m], 

• either 3i £ [l..n] such that ipi = ip ' 

• or ip ' = (D',C,p r ) and there exists ipi = (D,C,p), i £ [l..n] such that 
D d D'. 

Conditions a. and b. ensure that some sCSPs have been split into sub-sCSPs by 
splitting their domains (one or several variable domains) into smaller domains 
without discarding solutions (defined by the union of solutions of the ipi). Con- 
dition c. ensures that the search space does not grow: none of the domain of 
the sCSPs composing W is not included in one of the domain of some sCSP 
composing 'S . Note that the domain of several variables of several sCSPs can be 
split at the same time. 

Definition 8 (Local Search). A local search function Ajv is a function 

Aa t: SCSP -* SCSP 

{ipi,--- ,ip n } !->■ { f) • • • • • v'„ } 



where 

— N is the maximum number of consecutive moves 

— Vz £ [l..n] 

• either ipi =ip\ 

• or ipi = ( D,C,p ) and ipi = (D,C,p r ) with p = (si,---,Sfc) and p' = 
(si, • • • , Sfc, Sfc+i) such that Sfc+i £ Af(sk) (T D and k + 1 < N . 

N represents the maximum length of a local search path, i.e. , the number of 
moves allowed in a usual local search process. A local search function can try 
to improve the sample of one or several sCSPs at once. Note that ipi=ip[ may 
happen when: 

1. p £ SoIcSd ('*/’) : th e l as t sample s n of the current local search path cannot 
be improved using Ajv, 
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2. n = N: the maximum allowed number of moves has been reached, 

3. Xn is the identity function on i.e., Xn does not try to improve the local 
search path of the sCSP if>i. This might happen when no possible move can 
be performed (e.g., a descent algorithm has reached a local optimum or all 
neighbors are tabu in a tabu search algorithm [6]). 



3.5 Solving crCSPs 

The complete solving of a aCSP {(Hi, . . . , D„, C,p)} now consists in instanti- 
ating the GI algorithm: 



Computation Structure. 

— the C ordering is instantiated by the ordering given in Definition 4, 

— d :=-L corresponds to d := {(D\, . . . ,D n ,C,p)}, the Cartesian product of 
the variables domains and of the sample from the sCSP, 

— F is a set of given monotonic and inflationary functions as defined in Sec- 
tion 3.4: domain reduction functions (extensions of usual domain reduction 
functions for CSPs), domain splitting functions (usual splitting mechanisms 
integrated as reduction functions), and local search functions (e.g., functions 
for descent, tabu search, . . . ). 



Functions. We propose here an instantiation of the function schemes presented 
in the previous section. 

From an operational point of view, reduction functions have to be applied on 
some selected sCSPs of a given aCSP. Therefore we have to consider functions 
driven by a selection operator. Given a selection function select: A — > 2 B let 
us consider a function f select : A — > C such that f select (x) = g(y),y € select(x) 
where g: B — ► C. Therefore, f select can be viewed as a non deterministic function. 
Formally, we may associate to any function f select a family of deterministic 
functions o such that \/x € A, \/y G select(x), 3k > 0 ,f k (x) = g(y). If we 
consider finite sets A and B then this family is also finite. 

This corresponds to the fact that all possible functions are needed for each 
aCSP that can result from the application of some functions on the initial aCSP 
to model the different possible executions of the resolution process 3 . 

We first define functions on SCSP w.r.t. selection functions to select the do- 
mains on which the functions apply. Similarly and in order to extend operations 
on SCSP to SCSP, we introduce a selection process which allows us to extract 
particular sCSPs of a given aCSP (see Figure 1). 

Let us consider a domain selection function Selo'- SCSP —> 2 D and a sCSP 
selection function SeL,j, : SSCSP — ■» ESCSP. 

3 This is necessary in theory, however, in practice, only required functions are fed in 
the GI algorithm. 
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Fig. 1 . Selection functions 



Domain Reduction. We may first define a domain reduction operator on a 
single sCSP as: 

red SelD : SCSP -> SCSP 
i> = (D,C,p)»(D',C,p') 

such that 

1. D = {Dr, • • • , D n }, D' = {D[,- ■ ■ , D’J and VI < i < n 

- A G D \ Sel D (ip) =►£>; = Di 

- A G SeA(V>) 4fl'CD, 

2. p' = p if p £ CS' D otherwise p' is set to any sample chosen in e(D’) 

Note that Condition 2. insures that the local search path associated to the 
sCSP stays in e(A) 4 . This function is extended to SCSP as: 

red Sel^,SelD . SCSP -> SCSP 

Sel +(&)) U tesei+c*) red SelD (ip) 



Splitting. We may first define a splitting operator on a single sCSP as: 

sp s k elD : SCSP -> SCSP 

ip^'P’ 

with ip = (Di, . D h , . . . , D n , C,p) where {D h } = Sel D (ip) and 
V = {(£>i, . . .,D hl ,. . .,D n ,C,p{), • • • , (A, . . .,D hk . . .,D n ,C,p k )} such that 

k 

1- D h — [J D hi 

2—1 

2. for all i G [1..A;], p, = p if p G £5 otherwise, p^ is set to any sample chosen 
in e(D 1 ,...,D hi ,...,D n ). 



4 Note that we could keep p' = ( Si ) where Si is the latest element of p which belongs 
to D' , for instance, or a suitable sub-path of p. We have chosen to model here a 
restart from a randomly chosen sample after each reduction or splitting. 
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For the sake of readability we consider here the split of a single domain in the 
initial sCSP but it can obviously be extended to any selection function. The last 
condition is needed to satisfy the sCSP definition and corresponds to the fact 
that, informally, the samples associated to any sCSP belong to the box induced 
by their domains. 

sp Sei^,,Seio . ECS p SCSP 

* ~ {* \ Sel *(*)) U tesei+w spl elD W 

Local Search. As mentioned above, local search is viewed as the definition 
of a partial ordering which is used for the definition of the ordering C on 
SCSP. The remaining components to be defined are now: 1) the strategy to 
obtain a local search path p' of length n + 1 from a local search path p of length 
n, and 2) the stop criterion which is usually based on a fixed limited number of 
LS moves and, in this particular context of CSP resolution, the notion of reached 
solution. We first define a local search operator on SCSP based on a function 
strat: SCSP — » 2 e ^ D - 1 which defines the choice strategy of a given local search 
heuristics in order to move from a sample to one of its neighbors. 

\% rat : SCSP SCSP 

Ip h-t Ip' 

where 

— N is the maximum number of moves 

— ip = (C, D,p) and ip' = (C, D,p ') with p = (si, • • • , s n ) 

1. p' = p if p G Solcs D 

2. p' = p if n = N 

3. p' = (si, • • • , s n , s n +i) s.t. s n +i G strat(ip) D D otherwise 
We provide here some examples of well known “move” heuristics. 

— Descent: selects better neighbors 

strata ((D , C, (si , * * * , Sn))) — {s n +l G I S n - |_i <C eval Sn A Sn +1 G J\f (s n ) } 

— Strict Descent: selects best improving neighbors 

Strat s d((D, C , (Si, , S n ))) — {s n +l G ^(-^) I ’Sn+l <eval S n A Sn+l G 

A f(s n ) A Vs' G 7V(s n ), s n +i Ceval S } 

— Random Walk: selects all the neighbors 

strat rw ((D, C\ (si , * * * , s n ))) — { s n -i-i G £(D) | s n -i-i G J\f (s n ) } 

— Tabu of length 1: selects best neighbor not visited during the past l moves 
strcitt a b Ul ((D, L7, (si, , s n ))) — {s n +i G £(D) | Vu l A j A ?r, s n +i 

Sj A s n +i G A f(s n ) A Vs 7 G A f(s n ), s n +i < eva i s 7 } 

Note that, again, these functions satisfy the required properties (inflationary 
and monotonic) to be fed in the GI algorithm. Then this function is extended 
to SCSP as: 

x Sel^, strat . jjQgp EC SP 

Sel *(«?)) U iesei+w X N rat W 
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Combination. The combination strategy is now totally managed by the “choose 
function” of the GI algorithm: different scheduling of functions lead to the same 
result (in term of least common fixpoint), but not with the same efficiency. 

Note that in practice, we are not always interested in reaching the fixpoint 
of the GI algorithm, but we can also be interested in solutions such as: sCSPs 
which contain a solution for local search or a solution for constraint propagation. 
In this case, different runs of the GI algorithm with different strategies (“choose 
function”) can lead to different solutions (e.g., in case of problems with several 
solutions, or several local minima). 



Result of the GI Algorithm. We now compare the result of the GI algorithm 
w.r.t. the Definition 5 of solution of a aCSP. 

Since we are in Apt’s framework (concerning orderings and functions), given 
a aCSP P and a set F of reduction functions (as defined above) the GI algorithm 
computes a common least fixpoint of the functions in F. Clearly, this fixpoint 
glfp(P) abstracts all the solutions of Sol(P) : 

U (d ,C,p)£Sol{1r) d 2 U (d,C,p)£glfp(,V) d 

— for all ( D,C,p ) € Sol(P) s.t. p = (si,...,s„) G Solcs D (^) there exists a 
(d, C, p') G glfp(P) s.t. s n G e(d). 

The first item represents the fact that all domain reduction and splitting func- 
tions used in GI preserve solutions. The second item ensures that all solutions 
computed by LS functions are in the fixpoint of the GI algorithm. 

In practice, one can stop the GI algorithm before the fixpoint is reached. For 
example, one can compute the fixpoint of the LS functions; in this case, only 
some applications of the CP functions can reduce the search space (and thus, the 
possible moves) . This corresponds to the hybrid nature of the resolution process 
and the tradeoff between a complete and incomplete exploration of the space. 

4 Experimentation 

In this section, we present a prototype, developed in C++, which allows us to 
test hybridization on different CSP examples. 

4.1 Functions and Strategies 

We choose e(D i, • • • , D n ) as the Cartesian product Di x • • • x D n (e(D) = D). We 
consider the two selection functions min(P) = {?/>} C P such that •?/> / , i/; C ip’ 
and max(D) = {Di} such that Vj ^ «, \D t \ > \Dj\ (if there are several possible 
candidates choose the one with smallest index). A first set of domain reduction 
functions DR contains node and bound consistency operators (see [10]). The 
set SP contains the splitting operators split™ m ’ max and consist in cutting in 
two the largest domain of the minimal element of a crCSP. At last the set LS 
contains functions which corresponds to a tabu method: ™ tabui . 
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The purpose of this section is not to test a high performance algorithm on 
large scale benchmarks but to prove the interest of our framework over various 
small problems. 

According to the generic algorithm GI, note that one has to define a choose 
strategy at each iteration and to update the set of functions. Here we describe 
three different choose functions: 

— DRU SP: in this case we consider the initial set of functions F = DRU SP 
then the choose function is defined as: choose any g £ G D DR or any g £ G 
if G fl DR = 0 (G being the set of functions still to be applied in the GI 
algorithm). This simulates a complete backtracking algorithm. 

— LS: here we consider F = LS. Note that in this case there is only one LS 
function for tabu search. We have also experimented a descent with random 
walk algorithm. In that case, reduction functions corresponding to descent 
and random walk strategies are applicable on each sCSP and one has to 
choose them alternatively with a certain probability. 

— DR U SP U LS: we consider F = DR U P U LS. The choose function is: 
while G fl DR ^ 0 choose g £ G D DR\ then choose any g £ SP\ then while 
G D LS ^ 0 choose g £ G fl LS. In other terms, this strategy is: perform all 
possible domain reductions, then make a split, then a full local search; and 
iterate on this process. 



Selected Problems. We propose various problems : S+M=M, a well-known 
crypto-arithmetic problem which consists in solving the equation SEND + 
MORE = MONEY by assigning a different digit to each letter; Marathon, a 
problem of arrival in a race knowing particular information about the competi- 
tors (6 variables and 29 constraints); Scheduling problem (15 variables and 
29 constraints); classical Magic Square, Langford number and N-queens 
benchmarks. 

4.2 Experimental Results 

The values given in the following table correspond to the truncated average 
of 50 independent runs. Concerning DR U SP, we count then the number of 
iterations of splitting operators which corresponds to the number of nodes in a 
classical backtracking algorithm. Concerning LS, we count the number of applied 
functions, which corresponds to the number of moves performed by the local 
search. We also mention the success rate. The computation time (t) is given in 
seconds. For DR U SP U LS, we limit the number of moves for each local search 
to N = 100, while for LS alone, a maximum of 500,000 moves is allowed. Tabu 
list length l is set to 10. 

We compare first the number of nodes with DR U SP and DR U SP U LS to 
get the first solution. Table 1 shows that DR U SP U LS finds a solution with a 
smaller number of nodes compared to DR U SP alone. In the combination, the 
relative efficiency of the LS part depends on the problem. For problems with 
one solution such as S+M=M, the benefit is less significant than for the other 
benchmarks (in particular N-queens). 
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Table 1 . First solution {DR U SP vs. DR U SP U LS vs. LS) 



Problem 


DRUSP 
(nodes | t) 


DR U SP U LS 
(nodes | moves t) 


LS 

(s. rate moves | t) 


S+M=M 


(10 | 0.0) 


(5.4 | 375.2 | 0.0) 


(44% | 59294 | 5.6) 


Marathon 


(8 | 0.0) 


(1 | 11.5 | 0.0) 


(100% | 18 | 0.0) 


Scheduling 


(31 | 0.0) 


(1 | 2.5 | 0.0) 


(8% | 4726 | 59.1) 


24 -queens 


(3329800 | 1631.2) 


(1.2 | 62.8 | 0.6) 


(100% | 201.7 | 2.1) 


Magic-Square-4 


(39 | 0.0) 


(13.5 | 743 | 0.2) 


(100% | 3423 | 2.0) 


Langford 2 4 


(4 | 0.0) 


(1.6 | 95 | 0.0)) 


(100 % | 601 | 0.0) 



Table 1 also shows the efficiency of the local consistency which guides the local 
search (see comparisons DR U SP U LS vs. LS), in particular on S+M=M and 
Scheduling. One should remark that the success rate of LS is also an important 
parameter which is really improved by the combination since in that case, hybrid 
solving always succeeds in finding one solution. 

Computation time is, of course, strongly related to the implementation of 
the different operators. We may remark that this computation is improved by 
using DR\J SPU LS instead of LS alone. The comparison between DRUSP and 
DRUSPULS is not really significant except on N-queens where the hybridization 
provides an important saving of time. 

Finally, we have calculated the computing cost needed to get several solutions 
with DR U SP U LS on a Magic- Square- 3 problem which has 8 solutions. The 
mechanisms progress together to get a set of distinct solutions by computing 
fewer nodes (about 25 % less for each solution) than a classical backtracking 
algorithm simulated by DRUSP. To get several distinct solutions is a really good 
asset compared to LS alone (which was not designed for computing different 
solutions) and could be very interesting for number of problems. 

At last, similar experiments have been performed with a hill climbing algo- 
rithm and provided similar conclusions. 

5 Perspectives and Conclusion 

Most of hybrid approaches are ad-hoc algorithms based on a master-slave combi- 
nation: they favor the development of systems whose efficiency is strongly related 
to a given class of CSPs. 

In this paper, we have presented a global model which is more suitable for 
integrating different strategies of combination and for proving some properties 
of these combinations. We have shown that this work can serve as basis for the 
integration of LS and CP methods in order to highlight the connections between 
complete and incomplete techniques and their main properties. 

In the future, we plan to extend our framework in order to handle optimiza- 
tion problems. From a LS point of view, this will change the strat functions used 
to create the reduction functions. From a CP point of view, algorithms such as 
Branch and Bound requires adding new constraints during resolution: this could 
be done using a new type of reduction function in our model. 
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An other future extension is to provide some “tools” to help designing finer 
strategies in the GI algorithm. To this end, we plan to extend works of [7] 
where strategies are built using some composition operators in the GI algorithm. 
Moreover, this will also open possibilities of concurrent and parallel application 
of reduction functions inside our model. 

At last, we plan to complete our prototype implementation (Section 4) into 
a “fully” generic implementation of our framework in order to design and test 
new and finer efficient strategies of hybridization. 

Ackowledgement. The authors are grateful to Willem Jan van Hoeve for the 
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anonymous referees for their useful comments. 
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Abstract. Arc-consistency has been one of the most popular consistency tech- 
niques for space pruning in solving constraint satisfaction problems (CSPs), while 
lookahead appears to be its counterpart in answer set solvers. In this paper, we 
perform a theoretical comparison of their pruning powers, based on the trans- 
lation of Niemela from CSPs to answer set programs. Especially, we show two 
results. First, we show that lookahead is strictly stronger than arc-consistency. 
The extra pruning power comes from the ability to propagate unique values for 
variables, also called unit propagation in this paper, so that conflicts may be de- 
tected. This suggests that arc-consistency can be enhanced with unit propagation 
for CSPs. We then formalize this technique and show that, under the translation 
of Niemela, it has exactly the same pruning power as lookahead. 



1 Introduction 

Among constraint programming frameworks, two have been given a lot of attention. 
One is the framework of constraint satisfaction problems (CSPs), and the other propo- 
sitional satisfiability. Logic programming with the stable model semantics [6] can be 
viewed as a variant of the latter under a non-conventional semantics. 

CSPs are typically solved by a systematic backtracking search algorithm, while at 
each choice point consistency of certain kind is maintained in order to prune the search 
space. A general notion of consistency is called ^-consistency which requires that any 
partial solution for any k — 1 variables be consistently extended to a partial solution 
with any additional variable [4, 10]. The most popular degree of consistency is arc- 
consistency. A binary constraint is arc-consistent if and only if any assignment to any 
one of the variables in the constraint can be extended to a consistent assignment for the 
other variable. Many algorithms for maintaining arc-consistency for binary constraints 
have been developed and the optimal worst case complexity is known (cf. [2]). 

Developed largely in parallel, satisfiability (SAT) solvers and later answer set solvers 
provide another framework for solving constraints, where a constraint problem is repre- 
sented by clauses for the former and rules with default negation for the latter. Most com- 
plete solvers in this camp are based on the DP procedure (the Davis-Putnam-Logemann- 
Loveland algorithm) [ 1 ]. One of the main ideas for space pruning is that of lookahead 
- before a decision on a choice point is made, for each atom, if fixing the atom’s truth 
value leads to a contradiction, the atom should then get the opposite truth value (cf. 
[3]). In this way, the atom gets a truth value propagated from already assigned atoms 
without going through a search process. In trying to derive a contradiction, fast con- 
straint propagation algorithms are used. In SAT solvers the most popular technique is 

B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 314-328, 2004. 
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unit propagation , and in answer set programming, a representative is Smodels’ expand 
function [13]. 

Walsh [15] compares several encodings between CSPs and SAT. Generally speak- 
ing, unit propagation is weaker than arc-consistency. Gent [7], in extending an idea 
from Kasif [8], shows that the ‘support encoding’ of binary CSPs into SAT can have 
arc-consistency established by unit propagation. This is an interesting result. Essen- 
tially, for a variable x to take a value a, the support encoding encodes the set of values 
of the other variable that allows it. As the support for a value is encoded in SAT, unit 
propagation is sufficient for enforcing arc-consistency. 

However, to our knowledge, lookahead as defined for SAT or answer set solvers has 
never been related to arc-consistency in the past. Question arises as how is the pruning 
power of lookahead compared to that of arc-consistency. If one is more powerful than 
the other, then why? Can the other be extended to make up the gap? Is such an extension 
new, or already present in the literature? 

In this paper, we will answer these questions based on answer set programming, un- 
der some assumptions. The first assumption is that we only deal with binary constraints 
in this paper. We see no major technical hurdles in extending our results to /(-ary con- 
straints; nevertheless we will leave the details to the future version of this work. The 
second assumption is that we will fix the encoding from CSPs to answer set programs 
to be the one by Niemela [12], This encoding is probably the most straightforward 
translation from CSPs to answer set programs. Especially, no support information is 
explicitly encoded. Lastly, we shall choose a specific algorithm of lookahead for com- 
parison at the detailed, technical level. We will thus fix the answer set programming 
system to be the system of Smodels [14]. However, we will rely only on normal logic 
programming in Smodels. 

Under these assumptions, we show that lookahead prunes more search space than 
arc-consistency 1 . This result leads to the insight as what is missing in arc-consistency. 
This insight enables us to formalize the notion of arc-consistency with unique value 
propagation, or just unit propagation, a term borrowed from SAT, for CSPs. The ex- 
tended algorithm turns out to have the same pruning power as lookahead. Arc-consis- 
tency with unit propagation appears to be new in the literature of CSPs; it differs from 
the notion of arc-consistency look ahead [2], which enforces full arc-consistency on all 
uninstantiated variables after each tentative assignment to the current variable is made. 

The paper is organized as follows. The next section introduces CSPs. Section 3 in- 
troduces answer set programming and the lookahead algorithm as given in Smodels. In 
Section 4 we give Niemela’s translation and prove that lookahead is strictly stronger 
than arc-consistency. Then, in Section 5 we show the main result of this paper, namely 
arc-consistency plus unit propagation prunes exactly the same search space as does 
lookahead. In Section 6 we discuss some related work. Section 7 discusses some com- 
plexity results. We present an optimal worst case complexity result for lookahead. We 
then present a bound for computing the lookahead for the programs translated from 
CSPs by the method of Niemela. Finally, Section 8 comments on the directions of fu- 
ture work. 



1 For the convenience of comparison, arc-consistency here technically refers to arc-consistency 
combined with node-consistency. 
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2 Constraint Satisfaction Problem 

A constraint satisfaction problem (CSP) is a triple A(X, I). C) where X={xi, . . . , Xp - } 
is a finite set of variables with respective domains D = { D Xl , . . . , D Xk } listing pos- 
sible values for each variable, and C = {ci, . . . , c m } is a finite set of constraints. A 
constraint Ci £ C is a relation defined on a subset S, of X, Si is called the scope of c t . 
We may write cfiy i, y n ) to denote the scheme of the n-ary constraint Cj. 

A solution to a CSP A(X, D, C ) is an assignment where each variable is assigned 
a value from its domain such that all the constraints are satisfied. A constraint d is 
satisfied if and only if the assignment to variables in the scope of Cj yields a tuple in the 
relation Cj. We denote by a ; — > a that variable x is assigned value a £ D x . 

Two CSPs are said to be equivalent if and only if they have the same set of solutions. 
A constraint Ci is partially instantiated if zero or more variables in its scope have been 
assigned a value from their perspective domains. A partial instantiation/assignment is 
consistent with constraint c, if and only if the assignment yields a projection of a tuple 
in the relation of Ci. A partial assignment is consistent if and only if it is consistent with 
every constraint. A consistent partial assignment is also called a partial solution. 

A CSP is typically solved by a backtrack algorithm that incrementally attempts 
to extend a partial solution to a complete solution. At each step, the current variable 
is assigned a value, if the partial assignment is consistent, we continue with the next 
variable; if not, we continue with the next value in the domain of the current variable; 
if all the values in the domain have been tried and fail to give a consistent assignment, 
we backtrack to the preceding variable and try alternative values in its domain. The 
search space can be pruned in a backtrack algorithm by employing domain reduction 
operations. The idea is to reduce the domains of the unassigned variables by maintaining 
certain type of consistency before committing to a choice. The reduction to an empty 
domain causes the algorithm to backtrack. During backtracking, domain reductions are 
undone to the point where an alternative instantiation of a variable is sought. 

In this paper, we generally assume binary constraints if not said otherwise. 



2.1 Consistency Techniques 

The most interesting cases of fc-consistency are node-consistency (when k = 1) and arc- 
consistency (when k = 2). Maintaining node-consistency is simple. Many algorithms 
for maintaining arc-consistency have been developed (cf. [2, 11]). In this section, we are 
interested mainly in what such an algorithm accomplishes than its efficiency. For this 
reason, we describe an abstract, nondeterministic procedure that combines node and arc 
consistency, and simply call it AC. 

Algorithm AC 

AC takes as input a CSP A(X, D, C), and performs the following domain reduction 
operations repeatedly until no domain can be further reduced. 

1 . For any c £ C, if there is exactly one uninstantiated variable x in its scope, remove 
d from D x if d is inconsistent with the value of the instantiated variable in c. 
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2. For any c £ C where both variables x and y in the scope of c are uninstantiated, 
remove any d from D x for which there exists no value from D y that, together with 
d, is consistent with c. 

Example 1. Consider a CSP as described by the constraints {x < y, y < z, z < 3}, 
where D x = D y = {1,2,3} and D z = {2,3,4}. Enforcing node-consistency w.r.t. 
the last constraint reduces D z to D' z = {2,3}. Enforcing arc-consistency w.r.t. the 
first constraint reduces D x to D' x = {1,2} Similarly, I) y is reduced to D' = {2, 3}. 
Working on the constraint V < z for variable y, D' y is further reduced to {2}, which 
forces D' x to become {1} by the first constraint, and D' z to become {3} by the last 
constraint. By now, no further reduction is possible. ■ 

In the sequel, given a CSP, we will denote by k the maximum domain size and e 
the number of constraints. It is known that for binary CSPs, the optimal worst case 
complexity is 0(ek 2 ). This is the worst case complexity of the algorithm, named AC-4, 
for checking and enforcing arc-consistency [2]. In the following, we also assume there 
are as many constraints as variables in A(X, D, C ), so that we can simply refer to e for 
discussion of complexity. 



3 Constraint Propagation in Smodels 

In this section, we introduce constraint propagation in Smodels for normal logic pro- 
grams, which consist of rules of the form 

A <- Bi , ..., B m , not Ci, ...,not C n . 

where A, Bi and Cf are function-free atoms, and not is called a default negation. 
The head A may be omitted, in which case it serves as a constraint where the body must 
be false in any model. In systems like Smodels [14] and DLV [9] these programs are 
instantiated to ground instances for the computation of stable models. In the remainder 
of this paper, we only deal with ground programs. 

The stable model semantics is defined over the ground instantiation of a given logic 
program in two stages [6], The idea is that one guesses a set of atoms, and then tests 
whether the atoms in the set are precisely the consequences. In the first stage, given a 
program P and a set of atoms M, the reduct of P w.r.t. M is defined as: 

P M = {a <— &i, ..., b m | a <— bi, not a, ...,not c n € P 

andVz £ [l..n],Cj ^ M} 

Since P M is a program without default negation, its deductive closure {<f> \ P M h 
<j>, <j) is an atom in L}, is the least model of P M . Then, M is a stable model of P iff M 
is the least model of P M . 

Additional Notations: Atoms and default negations are both called literals. A set of 
literals is consistent if there is no atom <j> such that <t> and not ^ are both in the set. 
Atoms (<P) denotes the set of distinct atoms appearing in 4> where 0 is any syntactic 
entity. The expression not(not <j>) is identified with <j>. Given a set of literals B, B + = 
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{£ | £ is an atom in B} and B~ = {£ | not £ G B}. Let Q be a set of atoms. 
not(Q) = {not <f> \ <fi G Q}. 

Constraint propagation in Smodels is carried out by the function lookahead(P, A), 
where P is a program and A a set of literals representing a partial truth value as- 
signment. Given A, lookahead(P , A) picks up each unassigned atom q and assume 
a truth value; if it leads to a conflict, we know that q should get the opposite truth 
value in extending A. Truth values are propagated in lookahead by a function called 
expcind(P,A), which returns a superset of A, representing the process of propagat- 
ing the values of the atoms in A to some additional atoms. These functions have been 
proved to be correct in the sense that any stable model M for program P that agrees 
with A (meaning A + C M and A~ D M = 0) must agree with the set returned by such 
a function [13]. 

The functions expand and lookahead are given in Figures 1 and 2, respectively. 



Function expand(P,A) 
repeat 
A' := A 

A := Atleast(P, A) 

A := A (J {not <j>\ (j> G Atoms(P) and <j> Atmost(P, A)} 
until A = A' 
return A. 



Fig. 1 . Function expand(P, A) 



Function lookahead(P, ^4) 
repeat 
A' := A 

A := lookahead-once(P, A) 
until A — A 
return A. 



Function lookahead_once(P, A) 

B := Atoms(P) — Atoms(A) 
B - B {J not (B) 
while B ^ 0 do 

take any literal x G B 
A' := expand(P , A {J {x}) 
B -.= B - A' 



if conflict(P, A') then 

return expand(P, A {J {not (x)}) 
end while 
return A. 



Fig. 2. Function lookahead(P , ^4) 



Atleast(P , A) in the expand function returns a superset of A by repeatedly applying 
four propagation rules until no new literals can be deduced. Let r be a rule in program 
P of the form: r = h «— ..., a n , not bi, ..., not b m . Define 

min r (A) = {h \ {ai, ...,a n } C A+, {&i, ...,b m } C A~ } 
max r {A) = {h \ \a 1: ..., a„} f| A~ =(/), {b x , ..., b m } f] A + = 0} 
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1. If r € P, then A := A (J min r (A). 

2. If there is an atom a such that for all r £ P,a ^ max r (A), then A := A (J {not a}. 

3. If an atom a £ A, there is only one r £ P for which a £ max r (A), and there is a 
literal x such that a ^ max r (A (J {x}), then A := A (J {not x}. 

4. If not a £ A and there is a literal x such that for some r £ P, a £ min r (A [J{x}), 
then A := A (J {not x}. 

The propagation rule 1 adds the head of a rule to A if the body is true in A. If there is 
no rule with a as the head whose body is not false w.r.t. A, then a cannot be derived by 
any (consistent) extension of A, and thus cannot be in any stable model agreeing with 
A. Rule 3 says that if a £ A, the only rule with a as the head must have its body true in 
A. Rule 4 forces the body of a rule to be false if the head is false in A. 

The function Atmost(P, A) is defined as the least fixpoint of a function f(B) such 
that, given a rule r = h <— ai, a n , not b\, ..., not b m in P, f(B) contains h if a,’ s 
are in B — A~ , and none of the bj ’s is in A + . By including no t £ if £ Atmost(P , A ) , 
Atmost helps determine unfounded atoms in the same way as in computing the well- 
founded model [5]. 

Finally, in lookahead_once, the function conflict(P , A) returns true if A + nA~ 0 

and false otherwise. 

Some lemmas are needed later in this paper. Lemma 1 below is proved in Simons’ 
thesis [13]. Lemma 1 is used to prove Lemma 2 which has been known as a folklore in 
the past. Lemma 3 appears to be new, which will be used later to prove some complexity 
results. The proofs of these lemmas are omitted here. 

Lemma 1. expand(P , A) is monotonic in the parameter A. 

Lemma 2. lookahead{P 1 A) returns the same set, independent of any order in which 
literals are chosen from B in the while loop of lookahead-once(P, A). 

Given a logic program P, it is convenient to construct a dependency graph for P: 
for each rule a <— bi , .... b m , not Ci, ..., not c n in P, there is a positive edge from a 
to each b t , 1 < i < m, and a negative edge from a to each Cj, 1 < j < n. P has a 
positive loop if in its dependency graph there is a path from an atom to itself which only 
contains positive edges. 

Lemma 3. If a program P has no positive loops, then expand(P, A) can be computed 
in time linear in the size of P. 



4 AC Versus Lookahead 



In this section we first present Niemela’s translation [12], and then show that lookahead 
is strictly stronger than AC. 
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4.1 Translation from CSPs to Logic Programs 

Niemela’s translation is not limited only to binary constraints, so we will present this 
translation for n-ary constraints. Let A{X, I), C) be an n-ary CSP. 

The translation of Niemela [12] consists of three parts. The first part specifies the 
uniqueness property - a variable in CSP can only be assigned with one value. For each 
variable Xi £ X and its domain D Xi = {ai, ..., a;}, we use an atom, Xi(aj), to represent 
whether or not x, gets the value ctj. Thus, for each 1 < j < l, we add a rule 

x i( a j) <— not Xi(a i), ..., not Xi(cij-i), not Xi(a,j+ 1 ), ..., not Xi(ai). (1) 

In the second part, constraints are expressed. For each constraint scheme c,0i, ..., 
x n ) where Ci £ C, and each tuple (ai, ..., a n ) £ Ci , we add 

sat{ci) <— xi(oi), ..., x n (a„). (2) 

Finally, we express that every constraint in C = {ci, ..., c m } must be satisfied by 

sat <— sat(c i), ..., sat{c m ). (3) 

/ <— not /, not sat. (4) 

where / is a new symbol. 

The rules in 3 and 4 can be omitted if we ask a stable model generator to compute 
the stable models containing sat(ci ), ..., sat(c m ). This is what we assume in the rest of 
this paper. We will let sat{C) = {saf(ci), ..., sat(c m )}. 

Given a CSP A(X, I), C), we denote by Pa the logic program translated from it. 
The size of Pa is calculated as follows. For each variable the uniqueness property is 
expressed by at most k rules with at most k literals each. For binary constraints, the 
number of tuples in a constraint is at most k 2 . As we have e constraints, we get 

Proposition 1. For any (binary) CSP A(X . I). C), Pa can be constructed in ()(ek‘ 2 ) 
time. The size of Pa is also bounded by 0(ek 2 ). ■ 

One question one would like to ask is, under this translation, whether the propaga- 
tion by the expand function is powerful enough to enforce arc-consistency. The answer 
is NO, as shown by the following example. 

Example 2. Consider a CSP A(X. D, C) with one constraint c 0,?/) = {(0,0), (1,1)} 
where D x = {0,1,2} and D y = {0,1}. Under Niemela’s translation, we have only one 
rule with x(2) as the head: 

x(2) <— not x(0),not x(l). 

and two rules with sat(c) as the head: 

sat(c) <— x(0), y(0). 
sat(c) <- x(l),y(l). 

While enforcing arc-consistency removes 2 from D x , we do not deduce not x(2) in 
the call expand(PA , sat(C)). However, if we perform lookahead, by picking up x(2) 
in lookahead.once{PA , sat(C)), expand(PA , sat(C ) U {cc(2)}) deduces not sat(c), 
resulting in a conflict with sat(C). ■ 
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4.2 Lookahead is Stronger than AC 

Theorem 1. Let A(X, I). C ) be a CSP. Suppose after an application of AC, D is re- 
duced to D' . If for any variable Xj £ X, a value d is removed from D x ,, then 
not Xj(d) £ lookahead(P^, sat(C)). 

Proof Let d \ , . . . , d s be the sequence of domain values removed from the respective do- 
mains D Xl , ..., D Xs by AC, in that order. We show that there is a sequence of applica- 
tions of lookahead_once, each resulting in not Xi(df) being added, in the same order. 
Then, by Lemma 2, because the set of literals returned by lookahead(P^ 4, sat(C)}) is 
independent of the order in which literals are picked, the existence of the latter sequence 
is sufficient for the theorem to hold. 

Suppose d\ is removed from D Xl w.r.t. a constraint c £ C whose scheme in- 
cludes X\ and y as variables, due to the fact there is no value of y that, together 
with d\ for x\, is consistent with c. We show that if we pick up atom ati (<ii ) in 
lookahead-once(P^, sat(C)), and continue with the call expand{P^, sat(C) U 
{tri (c?i)}), a conflict will be deduced. 

By applying the propagation rule 3 in the function Atleast to the program rule in P4 
with Xi(di) as the head, for each d! £ D Xl such that d! d\, we have not X\(d') £ 
expand(P, sat(C) U {ati (rfi )}) - Thus, the bodies of the rules with sat(c) as the head 
are all false except possibly those with x± ( d\ ) in the body. But since there is no v £ D y 
such that d\ for x\ and v for y would give a partial solution for c, there is simply no rule 
in P 4 with sat(c ) as the head that contains Xi (di) in the body. By the propagation rule 
2 in the function Atleast, we deduce not sat(c), resulting in a conflict with sat(C). 

Similar argument applies if d\ is removed by maintaining node-consistency. 

The proof for the rest of dfs is by exactly the same argument under the setting: 
after d\ is removed, the resulting CSP can be re-expressed as A'(X, D' ,C), which is 
equivalent to A(X, D, C) by the correctness of arc-consistency; and after obtaining 
not Xj(di), the resulting program where X\(di) is false (thus any rule with X\(di) in 
the positive body is removed, so are the only rule with a?i (<ii ) as the head and all of the 
occurrences of not Xi(d\) in the body of any rule) is equivalent to the program Pjy . 

■ 

Together with Theorem 1, the following example shows that lookahead is strictly 
stronger than AC. 

Example 3. Consider a CSP 2 that consists of three variables x , y and z, both with the 
domain {0,1}, and the following constraints: 

ci{x, y) = {(0,0), (1,1)} 

C2(y,z) = {(0,l),(l,0),(l,l)} 
c 3 {z,x) = {(0,0), (1,1)} 

It is clear that AC cannot reduce any domain. A backtracking algorithm that begins 
with the assignment x — * 0 would need backtrack. Now, suppose x(0) is selected by 
lookahead_once. Then a conflict is arrived at so that not x(0) is added. One can verify 
that the function lookahead alone is able to generate the unique solution to the CSP. ■ 

2 This example was contributed by Dr. Guan-Shieng Huang in a group discussion at the Institute 
of Information Science, Academy Sinica, Taipei. 
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5 AC with Unit Propagation 

The insight revealed in the last section suggests that some space pruning power is miss- 
ing in arc-consistency when compared with lookahead. In this section we identify what 
is missing. 

Given a CSP A(C, X , D) and a partial solution 77, we define a function that extends 
77 as follows: 

unitjpropagate(A , II) 

= 77 U {x — > a | c(y, x) £ C or c( x, y) £ C, and y — » b £ II such that 
a is the only value in D x that is consistent with 6} 

That is, in a binary constraint, if the unassigned variable has exactly one value in its 
domain that is consistent with the value of the instantiated variable, the unassigned 
variable must be given this value, providing 77. 

A collection of pairs 77 is said to be in conflict if and only if there are distinct values 
d and d! such that for some variable x, x — > d, x — > d' £ 77. The notion of conflict is 
needed because, as we will see shortly, the function unit_propagate may lead to such a 
conflict. 

The function unitjpropagate* (A, 77) below calls unitjpropagate(A , 77) repeat- 
edly until nothing can be further propagated or a conflict is reached. That is. 

Function unitjpropagate* (A, 77) 
repeat 
77' := 77 

77 := unitjpropagate(A, 77') 
if 77 is in conflict then 

return “ conflict " 
until 77 = 77' 
return 77. 

Example 4. Let A(X, D , C) be a CSP with X = { x , y}, D x = D y = {0, 1}, and C 
consisting 

ci (x,y) = {(0, 1), (1,0)} c 2 (y,x) = {(0,0), (1,1)} 

Given 77 = {x — * 0}, unit_propagate* (A, 77) returns “conflict” because the set re- 
turned by unitjpropagate (A, 77 U {y — > 1}) is {x — > 0, y — > 1, x — > 1}. ■ 

Now we strengthen the process of enforcing arc-consistency by adding unit prop- 
agation, and name the function to be AC + . Again, we describe it as an abstract, non- 
deterministic procedure. 

Algorithm AC+ 

AC + takes as input a CSP A(X, D , C), and performs the following domain reduction 
operations repeatedly until no domain can be further reduced. 

1 . For any c £ C, if there is exactly one uninstantiated variable x in its scope, remove 
any d from D x if d is inconsistent with the value of the instantiated variable in c. 
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2. For any c £ C where both variables x and y in the scope of c are uninstantiated, 
remove any d from D x if 

(a) there is no value in D y that, together with d, is consistent with c; or 

(b) unit-.propagate* (A, {x — > d}) returns “conflict”. 

AC + is obviously correct. The only addition is part (b), in which case the conflict 
is a sufficient ground for removing d. Note that when the domains of all affected vari- 
ables have more than one consistent value, the process of unit propagation stops. So the 
overhead is proportional to the occurrences of unique values for variables during unit 
propagation. 

Theorem 2. Let A(C, X, D) be a CSP. Suppose after an application of AC + , D is 
reduced to D' . Then, for any variable x € X, a value d is removed from D x iff 
notx(d) € lookahead(P^, sat(C)). 

Proof (Sketch) As was done in the proof of Theorem 1 , it suffices to consider the first 
removal in the case of CSP, and the first call to the function lookaheadjonce in the 
other direction. Below, AC refers to the part of AC + without unit propagation. 

(=>) In conjunction with the proof of Theorem 1, we will be considering the effect 
of unit propagation only. Suppose ao is removed from D yo in ,4 C + , because the call 
unitjpropagate*(A,{yo — > ao}) returns “conflict”. Suppose also that we pick up 
yo(ao) in the function call lookaheadjmce{P J 4 , sat(C)). (Note that this is a valid as- 
sumption because the order of picking up literals is unimportant, cf. Lemma 2.) We 
need to show that the set returned by expand(P„ 4 , sat(C) U {yo( a o)}) is in conflict. 

Suppose “conflict” returned by unitjpropagate* (A, {3/0 ~ ’ ao}) is due to the se- 
quence of unit propagations recorded as follows: 

{yo — > a 0 ,yi — > ai, -> a n ,y n + 1 -> a n+ 1 } (5) 

where yo = y n +i and ao a n + 1 . We show, by a simple induction on n, that each 
yt + i(aj+i), 0 < i < n + 1, is deduced by the function Atleast, and therefore a conflict 
is generated due to yo = y n + 1 and «o 7 ^ a n +i. 

The base case is trivial as yo(ao) is given in the call expand{Pj,, saf(C)U{yo(ao)})- 
Now, for any i > 0, assume yi(af) is deduced and show that t/j + i(a,+i) is deduced. 
Since yfaf) is deduced, by the propagation rule 3 in Atleast, not yi(d') is deduced, for 
each d! £ D x such that d! ai. Thus, any rule encoding a constraint c £ C involving 
variable y., may only have a non-false body that contains yfaf). Let c’s scope include 
yi and yi+i. Now ai+i is the only value in D Vi+1 that is consistent with a* for It 
follows that there is exactly one rule with sat(c) as the head, i.e., 

sat(c) <— yi(ai),j/i + i(a i+ i). 

By the propagation rule 3 in Atleast, yi+i(ai+i) is deduced, generating a conflict. 

(<J=) Suppose expand(Py 1 , {yo(^o)} U sat(C)) generates a conflict. We show that if 
AC does not remove a 0 from D yo , unitjpropagate* (A, {yo — > ao}) in AC + must 
return “conflict”. 
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Now assume AC does not remove a o from D yo . By the propagation rule 3 , yo(cto) 
causes not yo(d), for any d £ D yo such that d ^ ao, to be deduced. Since AC does 
not remove ao from D yo , for each constraint c(t/o , yi) £ C (similarly for any constraint 
c'(j/i , j/o) G C) there are rules in P4 containing yo(ao) in the body which are not made 
false by the addition of not yo(d) where d £ D yo and d ao- That is, 

sat{c) 4- 2/0 (a 0 ), 2/1(61). 



sat(c) 4- yo(ao),yi(bk)- 

where k > 1 . It is clear that further deduction can be made only if k = 1 , so that 
the propagation rule 3 becomes applicable. Clearly, this corresponds to the propagation 
made by unit-propagate(A,{yo — > ao}). Let’s denote b\ by a\. Then we witness 
the sequence as generated in Equation 5 . As expand(P, A, {yo(ao)} U sat(C )) leads 
to a conflict, the corresponding sequence, as in Equation 5 , must have yo = y n + 1 and 
ao 7^ CLn+i- This leads to the generation of “conflict” by AC + . ■ 

6 Related Work 

In the literature it is often felt that arc-consistency may be too expensive to be beneficial 
in real applications. Hence a restricted version, called forward checking (FC) where the 
values from the domains of uninstantiated variables (also called future variables ) are 
filtered out if they are inconsistent with the current instantiation, is sometimes preferred. 

Experiments show, however, that for problems with relatively tight constraints and 
relatively sparse constraint graphs, algorithms where future variables are checked 
against each other could substantially outperform FC [ 4 ]. This approach is called arc- 
consistency look ahead, where full arc-consistency on all uninstantiated variables are 
enforced following each tentative value assignment to the current variable. If a vari- 
able’s domain becomes empty during this process, the current candidate value is re- 
jected. An implementation of this approach is given in [ 2 ] (page 135 , Fig. 5 . 10 ). 

We use the following example to explain the difference between arc-consistency 
look ahead and arc-consistency with unit propagation. 

Example 5 . Consider a CSP with X = {x\,X2, £3, 2:4}, all the variables with domain 
{0, 1}, and the following constraints 

c 1 (* 1 ,* 2 ) = {(0,0),(0,l),(l,l)} 

02(0:2, * 3 ) = {(0,0), (0,1), (1,1), (1,0)} 

C 3(X3, £ 4 ) = {( 0 , 0 ), ( 1 , 1 )} 

04(2:4, X3) = {( 0 , 1 ), ( 1 , 0 )} 

One can see that C3 and C4 together cannot yield a consistent assignment between 2:3 
and X4. But arc-consistency cannot detect it. In fact, arc-consistency enforced on this 
CSP does not remove any domain value. 

Now consider arc-consistency look ahead. Suppose the current variable is x\. Let 
us tentatively assign 2:1 with 0 . Now, Since enforcing arc-consistency involves two vari- 
ables, with a tentative assignment to the current variable, consistency check is among 
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triplets, much stronger than arc-consistency alone. However, one can see that since the 
spot of conflict is unrelated to variable xi, the inconsistency cannot be detected. 

However, with AC + , 0 is removed from l ) r ., , so is 1, resulting in an empty domain, 
hence that this CSP has no solution can be answered without search. ■ 

In the above example, one can also see that if we perform arc-consistency look 
ahead for all future variables, it then coincides with arc-consistency with unit prop- 
agation. The precise relationship between arc-consistency with unit propagation and 
arc-consistency look ahead is an interesting question to be answered in a further inves- 
tigation. 



7 Complexity Results 

Despite the popularity of Smodels, the complexity of its key component, lookahead, 
has not been studied carefully. In this section we will see that its complexity is higher 
than what researchers assumed previously. 

Theorem 3. Let P be a program, n be the size of P and m the number of distinct atoms 
appearing in P. Then, the running time for lookahead is bounded by 0(mn 3 ). 

Proof. The expand function in lookahead is bounded by 0(nm) as both functions 
Atleast and Atmost can be computed in time linear in n, and there are at most m rounds 
of iterations. For lookahead, since after 2m rounds of iterations (each atom is tested 
for being true and false respectively), at least one conflict must be deduced for it to 
continue, there are at most 0(m 2 ) total rounds of iterations. Thus, a call to lookahead 
takes at most 0(nm 3 ) time. ■ 

If we take program size as the only parameter, our bound translates to 0(n 4 ), which 
is higher than the folklore 0(n 3 ). Now, by constructing a program that takes lookahead 
at least 0(nm 3 ) time to run in the worst case, we show that this bound is optimal. That 
is, our worst case complexity cannot be lowered further. 

We construct a program consisting of three parts, Pi, P 2 , and P 3 . The idea is that 
Pi will make lookahead call lookahead_once (hence the expand function) 0(m 2 ) times 
in the worst case; in each call to expand, P 2 makes it call Atleast and Atmost 0(m ) 
times; and P 3 makes each call to Atmost take ()(n) time to run. 

Let us construct Pi. We start with 2s atoms, ai, .., a s and a\ .... u' K . Pi contains the 
following rules: 



Vz G [l..s — 1], 


ai <- 


- not Oj, 


Vz G [l..s], 


Oj <- 


- not o'. 




<- 


- not Oi. 


and 


a s * 


— not a s 



The idea in this example is simple. Let us consider only literals not a*, 1 < i < s. 
Suppose each time when lookahead_once is called, it picks up a literal in the order: 
not ai, not 02 , ..., not a s . In the first round, only when we get to not a s are we able 
to deduce a conflict and hence conclude a s . In the zth round we conclude a s _, + 1 , and 
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so on. Thus, the total number of calls is s(s + l)/2, quadratic in s. That is, Pl in the 
worst case makes lookahead call expand 0 (s 2 ) times. 

Now we construct P 2 . Suppose there are additional 2 s' atoms, Ci, ..., c s > and bi , ..., 
b s >. Then P 2 is: 

Vi G [l..s], Ci <— not a*. 

Vi G [1..S 7 ], bi <— not Ci . 

bi <— bi. 

Vi G [l..s' — 1], Cj+i <— not bi . 

It is clear that each time when not a, is picked up by lookahead_once, it makes expand 
call Atleast and Atmost, alternately, each s' times. At the ith round, Atleast generates 
Ci and then not bi is added, as bi is not derivable by Atmost and becomes unfounded. 
Thus, P 2 makes expand call Atleast and Atmost O(s') times. 

The goal of P 3 is to make Atmost thrashing hence linear time must be taken each 
time when it is called. Suppose there are additional s" atoms, pi, .... p s n. Then we 
construct P 3 as: 

Vi G [2. .s'], pi <— not Ci . 

Vi G |l..s" - 1], pi+i<-pi. 

Recall that in the expand function Atmost is called after each call to Atleast, and P 2 
makes this repetition s' times. According to the algorithm that implements Atmost 
([13], page 39, procedure atmostf)), in the first call to Atmost, all pi are derived, be- 
cause none of Ci, 2 < i < s', were deduced by the first call to Atleast. In the second 
round, as Atleast generated C 2 , the computation of Atmost must be re-done. The pro- 
cedure atmostQ first removes pi from the closure, due to the rule pi 4— not C 2 , and 
subsquently all of the pf s; and it then puts p\ back into the closure due to the rules 
Pi 4— not Ci, i > 2, and subsequently all pf s. Clearly, the running time is proportional 
to s". That is, each invocation of Atmost in this case takes 0(s") time to run. 

Note that in our construction, there is no restriction on the possible values for s, s', 
and s". Thus, we may simply make s = s' = s" , and let P = Pi U P 2 U P 3 . Let m be 
the number of distinct atoms appearing in P and n the size of P. It is clear that s and 
s' are both linear in m and s" is linear in n. 

Putting all the arguments together, we conclude 

Theorem 4. 0(nm 3 ) is the optimal worst time complexity for lookahead as imple- 
mented in Smodels. ■ 



Note that this result is specifically for lookahead as implemented in Smodels. In 
particular, it does not rule out the possibility of developing more efficient algorithms for 
lookahead whose worst time complexities are lower. This is an interesting yet important 
question to be investigated in the future. 

As the final result of this section, we give a bound for lookahead when it is used to 
solve CSPs based on Niemela's translation. 

Theorem 5. Let A(X, I), C) be a (binary) CSP. Under the translation of Niemela, the 
time taken by lookahead is bounded by 0(e 3 fc 4 ). 
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Proof. The size of the translated program is bounded by 0(ek 2 ). By Lemma 3, since 
the translated program has no positive loops, the bound on the expand function is also 
0(ek 2 ). There are ek number of variables in the translated program. As lookahead_once 
is called 0(e 2 k 2 ) times in the worst case, and each with one call to expand, we get the 
total time 0(e 3 fc 4 ). ■ 

8 Future Directions 

Despite their proximity, in the past constraint propagation in answer set programming 
has rarely been compared with consistency techniques in solving CSPs. In this paper, 
we establish a theoretical connection between arc-consistency used in CSPs and looka- 
head in answer set programming. In particular, we show that arc-consistency enhanced 
with unit propagation coincides with lookahead under Niemela’s translation from bi- 
nary CSPs to logic programs. Since Niemela’s translation does not contain positive 
loops, the relationships studied here can be established similarly for lookahead as used 
in SAT. 

The relations established in this paper only began the first step in this direction. 
Many more questions remain, among which bounds consistency as used in CSPs ap- 
pears particularly interesting, not only because it is effective in reducing large numeric 
domains, but also because it appears to have no counterpart in answer set solvers. Tra- 
ditionally, numeric constraints have been a weak spot in SAT-based approaches. We 
would like to see the arsenals of constraint propagation techniques in answer set pro- 
gramming also include a form of “bounds consistency”, so that the search space due to 
large numeric domains may be reduced effectively. 

Answer set solvers may also benefit from global constraints such as AllDifferent. 
One way to extend an answer set solver with such a special constraint is by writing an 
answer set program that implements it. Another way is to build special propagation rules 
into an answer set solver to support special language constructs for special constraints. 

The relationships established in this paper are based on a particular translation from 
CSPs to answer set programs. The question remains open as whether the same con- 
clusions hold for other translations. Since lookahead in Smodels is a general algorithm 
without fine tuning for binary constraints, the complexity given in Theorem 5 for per- 
forming arc-consistency with unit propagation is quite high. It is interesting to design 
a special, more efficient algorithm for arc-consistency with unit propagation. One such 
algorithm will be reported in a forthcoming paper by the authors. 

On the other hand, we believe that the algorithm for lookahead as implemented in 
Smodels can be improved. Any trick in our argument for Theorem 4 points to such a 
possibility. For example, our argument depends on the implementation function 
atmostf) in Simons’ thesis. Is there an algorithm for Atmost that can avoid the thrash- 
ing behavior? As another example, our argument also relies on a particular order in 
which literals are chosen by lookahead_once to make the worst case happen. Are there 
heuristics that are likely to avoid bad orders? Here, the purpose is to reduce the over- 
head of lookahead, hence these type of heuristics should be able to be computed very 
efficiently. 
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Abstract. The period constraint restricts the smallest period of a se- 
quence of domain variables to be equal to a given domain variable. This 
paper first provides propositions for evaluating the feasible values of the 
smallest period of a sequence of domain variables. Then, it gives propo- 
sitions for pruning the variables of the sequence in order to achieve one 
out of several possible smallest periods. The generalisation of the period 
constraint to the case where the equality between two domain variables 
is replaced by any condition is also considered. 



1 Introduction 

From a constraint perspective a first way to cope with some problems encoun- 
tered in the area of combinatorial pattern matching [3] is to introduce variables 
that take their values within a set of strings; this set is then represented by a 
regular language. In this context, various constraints [6] can be implemented 
using finite automata. However, if for compatibility reasons with existing con- 
straints, one wants to stick [9] to domain variables, then a way to proceed is to 
replace each fixed letter s[i] of a sequence s by a domain variable 1 ; the corre- 
sponding domain contains all the potential letters that s[i] can take. Now, to 
each classical problem of computing a given sequence characteristic we associate 
a constraint, which enforces a given domain variable to be equal to the charac- 
teristic we consider on a sequence of domain variables. This process is illustrated 
on the following concrete example. As a characteristic, consider the period of a 
sequence of domain variables Vq V\ - ■ ■ V m -\ (i.e. the smallest natural number 
p such that Vi = Vi +P for all i £ {0,1 , . . . ,m — p — 1}) and the corresponding 
constraint period(P, Sequence), where: 

— P is a domain variable, 

— Sequence is a list [Vo, Vi, . . . , V m -i] of to domain variables. 

The period constraint holds iff P is the period of the sequence Vo V\ ■ ■ ■ V m -i- 
For instance, period(3, [1, 1,4, 1, 1, 4, 1, 1]) holds since 3 is the smallest period 
out of all periods 3,6,7 and 8 of the sequence 1141 1411. 

* This work was undertaken when the author was at SICS. 

1 A domain variable is a variable that ranges over a finite set of integers; dom(V) 
denotes the set of possible values of variable V, while min(V) and max(V) denote 
respectively the minimum and the maximum value of V. 



B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 329-342, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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The period constraint was generalised in [1, page 110] by considering an ex- 
plicit binary constraint instead of the equality constraint used in its original 
definition. This binary constraint is provided as a third argument of the gener- 
alised period constraint. For instance, the constraint period(2, [3, 1, 4, 6, 9, 7], <) 
holds since 3<4<9, 1<6<7 and since 3<1<4<6<9<7 does not 
hold. As a motivating example for the period constraint and its generalisation, 
consider a cyclic scheduling problem of the following type: given a list of persons 
and a list of possible working periods (i.e. morning, afternoon, night, day off), 
the task is to determine a schedule over a period of one month, which provides 
for each person and each day the corresponding working period. In addition, each 
person has to work in a cyclic way, i.e. according to an unknown work pattern 
that is repeated with the assumption that days off do not break any cycle. To 
model this kind of periodicity we replace the equality constraint X = Y by the 
constraint (X = Y) V (X = 0) V (Y = 0) (assuming that 0 corresponds to a day 
off). The use of constraints for solving other types of cyclic scheduling problems 
is presented in [2] and [8]. 

No filtering algorithm was yet available, neither for the period constraint it- 
self, nor for its generalisation. As usual within constraint programming, we only 
have partial information about the sequence we consider and its potential small- 
est periods. So our problem is not to have an efficient algorithm for computing 
the period of a given sequence [5], [7], but rather to perform the following tasks: 

— On one side, we want to discard from the domain of P those values, which 
can’t be the smallest period of any sequence that can be constructed from 
the domains of Vo, Vi, ... , V m _i by replacing each variable by one value of 
its domain, 

— On the other side, we want to remove from the domains of Vo, Vi, . . . , V m -i 
those values such that we can’t generate any sequence for which the smallest 
period belongs to the domain of P. 

As an illustrative example of the previous tasks, consider the constraint 
period(P, [Vo, Vi, V2, V3, V4, V5, Ve>, V}]), where the initial domains of the vari- 
ables are as follows: dom(P) = {3,4, 5,8}, dom{\ 0) = {2}, dom(V\) = {2}, 
dom(V 2) = {2,6,9}, dom(V 3) = {2,4}, dom{\ 4) = {2,8}, dom{\ 5) = {2,8}, 
dom(Vo) = {2, 6, 8, 9}, dom(V 7) = {2}. We first want to find out that 3 and 8 
are not feasible smallest periods for all the potential sequences, which can be 
generated from the domains of Vo, V}, . . . , V7. In fact, on one hand 3 is not a 
feasible smallest period since all variables Vo, Vi, . . . , V7 would have to be fixed 
to value 2 which would enforce a smallest period of l 2 . On the other hand, 8 is 
also not possible since 7 is a period, possibly not the smallest, of the sequence 
V 0 Vi • • • V7. In order to achieve the only remaining smallest periods of 4 or 5, 
we restrict the domains of the variables V5 and Vq. Hence, dom{\ 5) = {2} and 
dom(V( 3) = {2, 6, 9}. 

The paper is organised as follows. Section 2 sets up some basic required no- 
tions from the area of combinatorial pattern matching. Section 3 shows how to 

Independently of the fact that value 1 is or is not in the domain of P. 
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handle the ground case of the period constraint, i.e. the case when all variables 
are already fixed. Section 4 presents some propositions and their corresponding 
implementation, which allow discarding infeasible periods within a sequence of 
domain variables. Finally, Section 5 introduces propositions that combine the 
apparently feasible values of the smallest period P with the potential values 
of the variables of the sequence Vo V) • • • V m -i in order to prune the domains 
of Vo, Vi, . . . , V m -x- All propositions related to pruning consider the basic pe- 
riod constraint itself. Their adaptation for the generalised period constraint is 
systematically investigated. 



2 Background 



This section follows the presentation of Croclremore et al. [4]. A sequence on 
an alphabet A is a finite series of elements of A. In our context, A consists of 
all those natural numbers, which occur in the initial domains of the variables 
Vo, V), ... , Vm_i of the period constraint. Within the rest of this paper, the ex- 
pression “a sequence Vo Vi • • • V m _i” should be interpreted as “all sequences, 
which can be obtained from the domains of the variables Vo, V \, . . . , V m -i by re- 
placing each variable by one value of its domain”. The empty sequence is denoted 
by £. The length of a sequence s is the number of elements of its corresponding 
series and is denoted by |s|. For i G {0, 1, . . . , |s| — 1}, s[?'] denotes the i-tli letter 
of s. Within a given sequence, the character * denotes any letter of A. A nat- 
ural number p such that 1 < p < |s| is a period of a non-empty sequence s if 
s[i] = s[?'+p] for all i € {0, 1, . . . , |s| — p— 1}. The period of a non-empty sequence 
s corresponds to its smallest period 3 and is denoted by per(s). A sequence r is 
a factor of a sequence s if there exists two sequences u and v such that s = urv; 
when u = e, r is a prefix of s; when v = e, r is a suffix of s; when r ^ s, r 
is called a proper factor of s. The factor s[i] s[i + 1] • • • s[j] of s is denoted by 
s[i..j]. A border of a non-empty sequence s is a proper factor of s, which is both 
a prefix and a suffix of s. For a sequence s, Bord(s) denotes the longest border 
among all the borders of s. A standard relation between the longest border and 
the period of a non-empty sequence s is |s| — \Bord(s)\ = per(s). As an example 
consider again the sequence 11411411. It has four borders e, 1, 1 1 and 
114 11 and a smallest period of 8 — 5 = 3. One way for computing the longest 
border of a non-empty sequence s is to calculate the border table of s defined 
as bord[k\ = |13orc?(s[0../c])| for k € {0, 1, . . . , |s| — 1}. Croclremore et al. (see [4, 
page 39]) give an <9(|s|) algorithm for computing the border table that is based 
on the following recurrence: 



Bord(ua) 



Bord(u)a if Bord(u)a is a prefix of it, 

Bord{Bord{u)a) else. 



(1) 



3 Within this paper keep in mind the difference between “a period” and “the period” 
of a sequence s. A sequence s may have several periods, but the period designates 
its minimum period. 
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3 Handling the Ground Case of the period Constraint 

Given a generalised period(P, [Vo, Vi, ... , V m -i },ctr) constraint where Vo, Vi, . . . , 
V m -i are fixed, we need to compute the period of VoVl • • • V m -i and check 
whether it belongs or not to dom(P). When ctr corresponds to the equal- 
ity constraint we can directly reuse the algorithm that computes the border 
table mentioned at the end of the previous section and exploit the identity 
per(V qVi ■ ■ ■ V m -i) = m — \Bord(VoVi . . . V m -i)\. But what should we do when 
ctr is not the equality constraint? To partially answer this question, we first 
extend the notion of border and then show how to adapt the recurrence used 
for computing the border under the hypothesis that ctr satisfies the transitivity 
property 4 . In that case, as for the equality constraint, we get a complexity of 
0(m). Otherwise, if ctr does not satisfy the transitivity property we use the algo- 
rithm which checks each potential period; it has an overall complexity of 0(m 2 ). 
These two algorithms will also be used in the rest of this paper for evaluating 
the period of a completely fixed factor of a sequence. 

Definition 1. A border of a non-empty sequence s according to a binary con- 
straint ctr corresponds to two proper factors u and v of s, such that: 

— u and v have the same length which is called the length of the border, 

— u is a prefix of s, 

— v is a suffix of s, 

— ctr(u[i],v[i)) holds for all i € {0,1, ... ,\u\ — 1}. 

Those two longest proper factors of s are respectively denoted by Pbordfs, ctr) 
and Sbord(s,ctr); Lbord(s,ctr) designates their common length. 

Proposition 1. Let ctr be a binary constraint satisfying the transitivity prop- 
erty. Then: 

T1 u . \ ( Lbordlu, ctr) + 1 if ctr(u\Lbord(u, ctr) + 11, a) holds, 

Lbordfua, ctr) = | Lhord \ phord{Uj ^ ctr) dse _ 

(2) 

Proof. Let v = Vq iq • • • v n = ua and p = Lbord{u, ctr). Then, V* € {0, 1 ,... ,p — 
1} : ctr{vi,v n - p+i ) holds. 

1. CASE where ctr(v p ,v n ) holds: 

A Vi G {0, 1, . . . ,p- 1} : ctr{vi,v n - p+i ) holds'! =» Vi G {0, l,...,p} : 
ctr(v p , v n ) holds J ctr{vi, holds. 

Hence, Lbordfv, ctr) = p+1. So Lbordfua, ctr) = Lbordfu, ctr)+ 1 is satisfied. 

2. CASE where ctr{v p ,v n ) does not hold: 

Pbordfu , ctr)a = Uotq • • • v p -\ a = vqV\ ■ ■ ■ v p -\v n . Let q = Lbord(Pbord(u, 
ctr)a,ctr). Then Vj G {0, 1, . . . , q — 2} : ctr(vj, V( p+ \)- q+ j) and ctr(v q -\, v n ) 

4 MX, Y, Z (ctr(X, Y) A ctr(Y, Z)) ^ ctr(X, Z). 
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both hold. When j G {0, 1, . . . , q — 2}, {j+p — q) G {p — q,p — q+l,...,p—2} 
and for i = p + 1 - q + j, ctr[vi,v n - P +i) becomes ctr{v p+ i- q+ j ,v n +i- q +j) ■ 
Let a given j G {0, 1, . . . , q — 2}. If ctr satisfies the transitivity property then 



A 



ctr(vj, Vp+i-q+j) holds 
ctr(v p+ i- q+ j,v n+ i- q+ j) holds 



=> ctr(vj,v n+ i- q+j ) holds. 



So, Vj G {0, 1, . . . , q— 2} : ctr(vj,v n +i- q +j) holds. Furthermore, ctr(v q - i,v n ) 
holds. Then, Lbord(ua , ctr) > q. 



Finally, since the recurrence (2) considers the borders of u in decreasing 
lengths, it computes a border of maximum length. □ 

When ctr satisfies the transitivity property, the algorithm based on the pre- 
vious recurrence can be adapted by changing the inequality test (see [4, page 
39, line 4 of the algorithm]) by the test that the binary constraint ctr does not 
hold. 



4 Searching for the Infeasible Smallest Periods 

As a basic fact, we have that the period of a sequence of domain variables 
Vo Vl • • • V m — i belongs to {1, 2, . . . , m}. The aim of this section is to detect 
infeasible smallest periods for the sequence Vq V\ ■ ■ ■ V m -i- For this purpose, we 
first introduce four propositions, which remove from dom(P) infeasible periods 
according to the fact that some variables do not take the same value. Finally, 
the last propositions of this section use the fact that, if p is a feasible period for 
a given sequence s, it follows that the smallest period of s can’t exceed p. 

Proposition 2. Let r be a factor of a sequence s. Then, per(s) > per(r). 

Proof. Straightforward from the definition of the period of a sequence. □ 

Example 1. Consider the sequence s = * 5 * 0210 **3535 * **, where * 
stands for a not yet fixed variable. Since the factors 5, 0 2 1 0 and 3 5 3 5 have 
a respective period of 1, 3 and 2, the period of s is greater than or equal to 3. 

Proposition 2 is used for evaluating a lower bound of the period of a sequence 
of variables Vo Pi • • • V m -i by computing the period of each factor of s that only 
consists of fixed variables of s and by stating that P is greater than or equal 
to those computed periods. This is achieved in O(m) by scanning the variables 
Vo, Pi, ... , P m _i from left to right and by computing the border tables of each 
completely fixed factor of s. 

Remark 1. Proposition 2 is still valid for any type of binary constraint. 

Proposition 3. Consider a sequence s = PoPi • • • P TO _ 1 and two distinct posi- 
tions i and j such that 0<i<j<m — 1 and V. ^ Vj. Then, ( j — i) can’t be a 
period of s. 
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Proof. Straightforward from the definition of the period of a sequence. □ 

Example 2. Consider the sequence s = *5*****9***. Since the 
difference between the position of 9 and the position of 5 is equal to 7 — 1 = 6, 
6 can’t be equal to the period of s. 

Proposition 3 is used to check each variable, which was fixed since the last 
time the period constraint was woken according to the other fixed variables. 
When the period constraint is posted, we create a table forbid[l..m] and initialise 
all its entries to 0. Its (j — i)-th entry will be set to 1 as soon as we find two 
fixed variables V) and Vj such that V, yf Vj (i < j). For each new variable which 
was fixed this is achieved in 0{in). 

Remark 2. As Proposition 2, Proposition 3 is still valid for any type of binary 
constraint, provided that we replace V, 7^ Vj by the negation of the binary 
constraint ctr. 

Proposition 4. Consider a sequence s = Vo Vl ■ • • V m -\ and a natural number 
p such that 1 < p < m — 1. If p is not a period of s then every number that 
exactly divides p, cant be a period of s. 

Proof. Let q be a divisor of p such that p = a ■ q (a £ IN*). We prove the 
contrapositive statement. Assume q is a period of s. It implies that for any 
i £ {0, 1, ... 5 TO p 1} . Vi = Vi+q, Vj+q = Vi+2-qi ■ ■ ■ j V)_|_( a _i).g = Vi+ a . q . 
Therefore, by transitivity, V, = Vi+ a . q i.e. V) = V+ p and so p is a period of s. 

□ 

Example 3. Consider again Example 2. Since 6 was not a period of s, the num- 
bers 3, 2 and 1 that exactly divide 6, can’t be a period of s. 

We reuse the table forbid[l..m] introduced for the implementation of Propo- 
sition 3. Each time the (j — z)-th entry is set to 1, all the natural numbers that 
exactly divide (j — i) are removed from P; the corresponding entries in the table 
forbid[l..m] are also set to 1. 

Remark 3. From the previous proof, Proposition 4 is still valid for a binary 
constraint which satisfies the transitivity property. We give an example showing 
that Proposition 4 can’t be used when the binary constraint does not satisfy the 
transitivity property. For this purpose, consider the binary constraint defined by 
the condition (A = Y) V (A = 0) V (Y = 0). According to this condition, 4 is 
not a period of the sequence 100020001 (since Vo = 1 and V4 = 2 are 
distinct and take both a value different from 0); however 2 and 1 are periods of 
the previous sequence. 

We now introduce a proposition which removes infeasible periods even when 
variables of the sequence are not yet fixed. 




The period Constraint 



335 



Notation 1 Consider a sequence s = Vo Vi • • • V m -i and a natural number p 
such that 1 < p < m — 1 . Let 

If = P| dom(Vk ) 

0 < k < to — 1 
k = i (mod p) 

be the intersection of the domains of the variables belonging to the i-th (i £ 
{0,1,..., p — 1}) group of variables according to p. Note that if p > [{yj then 
If = dom(Vi) for i € {m — p, m — p + 1, . . . ,p — 1}. 



Proposition 5. Consider a sequence s = Vo V\ ■ ■ ■ V m -i and a natural number 
p such that 1 < p < to — 1. Then, (a) p is not a period of s if and only if (b) 
there exists i £ {0, 1, ... ,p — 1} such that If = 0. 

Proof. 

— (a) => (b): If p is not a period then, by definition of a period, for any sequence 
vovi ■ ■ ■ Vm-1 where vq £ dom(Vo),v i £ dom(V £ dom(V m - 1 ) 
there exists i £ {0, 1, . . . , to — p— 1} such that Vi ^ Vi +p . Therefore If = 0. 

— (b) => (a): We prove the contrapositive statement. Let assume that p is a 

period of s. Then, there exists Vq £ dom(Vo), v\ £ dom(Vi),..., v m -i £ 
dom(y m -i) such that s = voV\ - ■ ■ v m -i has a period p i.e. Vi = Vi+ P for 
i £ {0, 1, . . . ,m — p — 1}. Therefore, If = {ui} for i £ {0, 1, . . . ,p — 1} is not 
empty. □ 

Example 4. Consider the sequence s = V o***V 4 ***Vs** with the following 
domains for the variables: dom{\ o) = {1, 2}, dom(V 4 ) = {0,2} and dom(Vs) = 
{0, 1}. Since the intersection dom(V 0 ) fl dom(V 4 ) fl dom(Vs ) is empty, 4 can’t be 
a period of s. 

Proposition 5 requires in the worst-case intersections between do- 

mains (to — 1 for p = 1, m — 2 for p = 2, . . ., 1 for p = m — 1). After application 
of Proposition 5, only feasible periods, possibly not smallest feasible periods, 
remain in the domain of P. 

Remark 4- Proposition 5 can be generalised, for a binary constraint ctr(X,Y), 
if we provide a necessary and sufficient condition for checking whether a con- 
junction of binary constraints of the form 

f\ ctr(V k ,V k+p ) 

0 < k < to — 1 
k = i (mod p) 

(i £ {0, 1 ,p— 1}) has at least one solution. For instance, consider the binary 
constraint (X = Y) V (X = 0) V (Y = 0) introduced in the motivating exam- 
ple related to the cyclic scheduling problem. In this context, a necessary and 
sufficient condition is that for all factor s[a • -/?] of s such that: 
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— (a = i (mod p)) A ((cc — p < 0) V (0 £ dom{V a - P ))), 

— (/3 = i (mod p)) A ((/? +p>m-l)V(0e dom{Vp +p ))), 

— 0 ^ dom(Vj ) for any j such that (a < j < /3) A (j = * (mod p)), 



the intersection 



is not empty. 



n 

a <k < (3 

k = i (mod p) 



dom(Vk ) 



We now come to propositions that handle the fact that P has to be the 
smallest period of sequence s. 



Proposition 6. Consider a sequence s = V$ V\ ■ ■ ■ P m _i and a natural number 
b such that 1 < b < {{yj ■ Let p denote the period of the sequence s[0..6 — l]s[m — 
b..m — 1] . The period of s is less than or equal to |~|] • p + in — 2 • b. 

Proof. We search for a feasible period of s, no matter which values are assigned 
to the in — 2 • b variables of s[b..m — b — 1]. For this purpose, we first remove 
the previous variables from s and compute a feasible period p of the remaining 
sequence. Then, we deduce a feasible period of s as following: 

— If p > b ( • p = p) then V?' € {0, 1, . . . , b — 1} we have i + p > m — b and 
p+m — 2-6 is a period of s, 

— Else (p < b). We consider the smallest multiple • p of p such that Vi € 
{0,1,.. .,6 — 1} : i+ - p > m — b. Then, • p + m — 2 • b is a period of s. 

□ 



Example 5. Consider the sequence s = 31**8*** 231. Since the period of the 
sequence 3 1 3 1 is 2, the period of s is less than or equal to |"|] • 2 + 11 — 2 • 2 = 9, 
no matter what value is assigned to the not yet fixed variables. 



Proposition 6 is used to adjust the upper bound of the period of the sequence 
s = Vo Vj • ■ ■ V m -i by considering the concatenation of the largest suffix and 
prefix of same length of fixed variables of s and by computing its smallest period. 
This has a worst-case complexity of 0(m). 



Remark 5. When p > b Proposition 6 is valid for any binary constraint. Oth- 
erwise (p < 6), it is also still valid for a binary constraint that satisfies the 
transitivity property: the transitivity is required to ensure that the multiple 
• p of period p is still a period. 

Proposition 7. Consider a sequence s = Vo V\ ■ ■ ■ V m -i and s' = s[0..6 — 
l]s[m — b. .to — 1] where b is the greatest integer in {1,2,..., {ttJ} such that 
all variables of s' are fixed but exactly one single variable, for instance V. The 
period of s is less than or equal to 



p = max 
v£dom(V) 



b_ 

Pv 



• Pv + m — 2 • b 



where p v is the period of s' when V = v. 
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Proof. Proposition 7 is a generalisation of Proposition 6 in the sense that s' 
contains a non-fixed variable. For each potential value of the domain of the 
non-fixed variable we compute an upper bound of the period of s and keep the 
maximum of the upper bounds. □ 

Example 6. Consider the sequence s = 1 b 2 14 2 1 1 2 with the follow- 

ing domains for the variables: dom(V\) = {1,2} and dom(V 3 ) = dom(V 4 ) = 
{0, 1, ... , 9}. Then, s' = 1 V\ 2 1 1 2, pi = 3, P 2 = 4 and the period of s is less 
than or equal to max( 3, 4) + 9 — 2 • 3 = 7. 

Proposition 7 is used to adjust the upper bound of the period of the sequence 
s = Vo Vi ■ ■ ■ V m -i by considering the concatenation of the leftmost and right- 
most groups of variables (of same length) of s with one non-fixed variable in the 
concatenation. 

Remark 6. When min v& dom(v)(Pv) > b Proposition 7 is valid for any binary 
constraint. Otherwise, it is also still valid for a binary constraint, which satisfies 
the transitivity property. 

Proposition 8. Consider a sequence s = Vo Vi • • • V m -i and a natural number 
p £ {2, 3, . . . , m}. Assume that each I jf (k £ {0, 1, . . . ,p — 1}) is reduced to only 
one single value. If there exists a period q < p of Iff If ■ ■ ■ If,_ 1 where q divides 
exactly p then p can't be the period of s. 

Proof. Since Iff If ■■ ■ Iff_i is completely fixed, there exists for sequence s only one 
solution of period p. Furthermore, if there exists a period q < p of Iff If ■ ■ ■ Ip_ 1 
where q divides exactly p, then q is also a period of s. Therefore p can’t be the 
smallest period of s. □ 

Example 7. Consider the sequence s = Vq V\ ■ ■ ■ Vj i with the following do- 
mains for the variables: dom{Vf) = {0,1,..., 9} for i £ {0,1,5,6,7,8,10,11}, 
dom{\ 2 ) = domiy 4 ) = {1} and dom(V 3 ) = dom(Vg) = {0}. For p = 4, we have 
Iq = If = { 1 }, If = I 3 = { 0 } and the single solution s = 101010101010 . 
Since 2 is a period of /q if = 1 0 1 0 then 4 can’t be the period of s. 

Proposition 8 is used to check that an apparently feasible period p of a se- 
quence s = V 0 Vi ■ ■ ■ Vm - 1 is not the period of s. For this purpose, we compute 
ip If ■ ■ ■ Ip_i and, if this sequence is completely fixed, we calculate its smallest 
period q. If that period q is strictly less than p and divides exactly p, we remove 
p from dom(P). Assuming that Iff , If , , I^_ t were already computed (see No- 
tation 1 ) the complexity for testing an apparently feasible period p is equal to 
the complexity of computing the period of Iff If ■■ ■ Iff _ l5 i.e. is equal to 0{p). 

Remark 7. Proposition 8 is still valid for a binary constraint which satisfies the 
equivalence property 5 . 

5 ctr satisfies the equivalence property if ctr has the following three properties: Re- 
flexivity NX ctr(X, A')), symmetry (VX, Y ctr(X,Y) => ctr(Y,X)) and transitivity 
(VX, Y, Z ( ctr(X , Y) A ctr(Y, Z)) =>■ ctr{X, Z)). 
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The following weaker form of Proposition 8 is first used for pruning the 
period of a sequence. This is because it has an overall complexity of 0(m ) that 
is independent from the number of periods to check. 

Proposition 9. Consider a sequence s containing a factor a k , where a stands 
for an element of the alphabet A and k > 1 for a strictly positive natural number. 
Then, the period of s can’t be equal to 2, 3 ,k. 

Proof. Let any p £ {1, 2, . . . , k}. s is containing a factor a k implies that If = {a} 
for i £ {0, 1, ... ,p — 1}. Since If If ■ ■ ■ If_ 1 has a period of 1 < p and since 1 
divides exactly p, then Proposition 8 tells us that p can’t be a period of s. □ 

Example 8. Consider the sequence s = * 0 0 0 0 * *. Since s contains the factor 
0 4 , its period can’t be equal to 2, 3 and 4. 

Proposition 9 is used by computing the size of the largest factor for which all 
variables are fixed to the same value and by pruning P according to this number. 

5 Pruning According to the Potential Periods 

The aim of this section is to give rules for pruning the variables Vq,V\ ... , V m _i 
so to enforce the period of Vo Vi . . . U m _i to be one of the values of the domain 
of P. Most of these rules are based on the propositions presented in the previous 
section. We start with a proposition directly derived from the definition of the 
period of a sequence. 

Proposition 10. Consider a sequence s = \ o V± ■ ■ ■ V m -\. 

1. For all i such that max(P) < i < m — 1 we remove from domfVi) all values 
that don’t belong to 

(J dom(Vi). 

0 <1 <i 
(i — l) £ dom(P) 

2. For all i such that 0 < i < m ~ 1 — max(P) we remove from dom(Vi) all 
values that don’t belong to 

[J dom(V k ). 

i < k < m — 1 
(k — i) £ dom(P) 

Proof. From the definition of the period of a sequence, if * — max(P) > 0 then 
V-i has to be equal to a variable Vi such that (i — l) € dom(P). Using constructive 
disjunction 6 [10] on the previous equalities leads to remove all values not in 

[J dom{Vi). 

0 <1 <i 
(i — l) £ dom(P) 

The second part of Proposition 10 is proved in a similar way. □ 



We remove those values that are discarded by all the equality constraints associated 
to the potential periods. 
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Example 9. Consider the sequence s = Vo V 1V2V3V4V5 with the following 
domains for the variables: dom(Vo) = {0,1, 2, 8}, do?n(V 1) = {0,1}, dom(V 2) = 
{2,8}, domiy 3) = {0,1}, domiy 4) = {1,2,6} and domiy 5) = {0,1}. Further- 
more, assume that P has to take value 2 or value 4. Part 1 of Proposition 10 
considers variables V4 and V5: For V4, since dom(Vo)Udom(V 2) = {0, 1, 2, 8} then 
6 € domiy 4) has to be removed from dom{\ 4) in order to achieve a smallest pe- 
riod of 2 or 4. No pruning occurs for V5 . Similarly, from part 2 of Proposition 
10, since domiy 2) U dom{\ 4) = {1, 2, 6, 8} then 0 € dom(V 0) has to be removed 
from dom(Vo). 

For a given value ua?, finding all variables from which it has to be removed 
according to Proposition 10 is similar to the problem of finding all occurrences of 
a sequence r which contains jokers within a sequence t which does not contains 
any joker. A joker is a special letter $ which does not belong to the alphabet A 
and which can be matched to any character of the alphabet A. The length of 
sequence t is m and its i-tli position t[i\ is equal to 1 if val € dom(Vi) and 0 
otherwise. The sequence r, we want to localise within t, is defined as follows: 

— r[0] = 1 represents the value we want to prune, 

— For i £ dom{P): r[i] = 0 (0 represents the fact that we don’t want to find 
value val at any position corresponding to a potential period of s), 

— For i ^ dom{P) and i < max(P): r[i] =$ ($ represents the fact that we don’t 
care finding value val or not, for all those positions which do not correspond 
to potential periods of s). 

Using a standard algorithm from [4, page 266] allows finding all occurrences of r 
within t in time 0 {n ■ p ■ m) where n- p is the number of intervals of consecutive 
values of dom{P). 

Example 10. Consider again the previous example where s = Vo Vi V2 V3 V4 V5 
with the following domains for the variables: dom{Vo) = {0,1, 2, 8}, dorn(Vi) = 
{0,1}, domiy 2) = {2,8}, domiyy = {0,1}, domiy 4) = {1,2,6} and dom{v< 5) = 
{0, 1}. Assume that we want to localise those variables from which we can remove 
value 0. We first build the sequence £=110101 for which the *-th (0 < i < m) 
position contains 1 when 0 € dom(Vi ) and 0 otherwise. Since do?n(P) = {2,4}, 
we search the occurrences of the sequence r = 1 $ 0 $ 0 within t and find one 
single match when the first positions of both sequences coincide. Therefore, we 
can remove value 0 from dom(Vo). 

Remark 8. Proposition 10 is still valid for a binary constraint for which we can 
provide a complete filtering algorithm for a conjunction of binary constraints of 
the form ctr(X, Yi) A ctr(A, I2) A • • • A ctr(X, Y n ). 

In Proposition 10 we consider, for a variable Vj, only its immediate neighbours 
according to all its potential periods. In order to remove more values one could 
restrict domiVi) to: 

U dom ( i ?mod P )- 

p£dom(P) 
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In that case, pruning all the domains of variables Vo,Pi--- ,V m -i requires 
m ■ size(P) unions and m — — intersections between domains, where size(P) 
denotes the number of values of dom(P). Note that it still does not allow to get 
a complete pruning since potential periods and not smallest potential periods 
are considered. 

The next proposition is derived from Proposition 4 and Proposition 5. 

Proposition 11. Note q the least common multiple of the potential values of P 
and assume q < m. For each i £ {0, 1, . . . , min(q , m — q) — 1}, the domains of 
the variables Vi, V.i+ q , ■ ■ ■ , V- , j , q are restricted to If. 

Proof. Since q is a multiple of the period of s (no matter which values are 
assigned to the not yet fixed variables of s), the contraposition of Proposition 4 
tells us that q is also a period of s. We prune according to that fact. □ 

Example 11. Consider the sequence s = Vo Pi V 2 V3 V 4 P5 Ve V 7 Vs Vg with 
the following domains for the variables: dom{\ 0 ) = dom(V 1 ) = dom(V 2 ) = 
dom{\ 3 ) = dom(y 4 ) = { 1 , 2 } and dom(V. 5 ) = dom(Ve) = dom(V 7 ) = dom(Vs) = 
dom{\ 9 ) = {0,2}. Furthermore, assume that dom(P) = {1,2,3}. Since 6 is the 
least common multiple of the previous values, the domains of the following pairs 
of variables ( Vo > P), (Pi, V7), (P2, Vs) & n d (P3, Vg) are respectively restricted to 
the intersections dom(Vo)ridom(Ve) = { 2 }, dom ( Pi ) ft dom{Vr) = { 2 }, dom(V 2 ) 1 "! 
dom(\ s) = {2} and domiy^) C\dom(Vg) = {2}. 

Remark 9. Proposition 11 is still valid for a binary constraint ctr(X,Y) which 
satisfies the transitivity property and for which we can provide a complete fil- 
tering algorithm for a conjunction of binary constraints of the form 

A ctr(V k ,V k+q ) (i e {0,1,..., g- 1}). 

0 < k < m — 1 
k = i (mod q) 

The next proposition is directly derived from Proposition 9. 

Proposition 12. Consider a sequence s = Po Pl • • • P m -i containing a factor 
a kl Vi af 2 , where a stands for an element of the alphabet A, k\ and &2 for two 
strictly positive natural numbers and P (1 < i < m— 2 ) for a not yet fixed domain 
variable. If the domain of the period of s is included within {2, 3, . . . , k\ + + 1} 

then V. can’t take value a. 

Proof. The reverse of Proposition 9 tells us that, if pi is both in P and satisfies 
Pi < ki + &2 + 1, then s can’t contain the factor s = a kl+k2+1 . Therefore p can’t 
take value a. □ 

Example 12. Consider the sequence s = 0 0 P 2 0 p P 5 and assume the period 
of s to be 2, 3 or 4. Since s contains the factor s = 0 2 P 2 0 1 , P 2 can’t take 0 as 
value. 
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A simple scan through positions 0,1, — lof sequence s allows localising 

those factors a kl Vi a k2 from which one can prune a not yet fixed variable Vi 
(1 <i<m — 2). 

Let us now introduce Proposition 13, which is derived from Proposition 6. 



Proposition 13. Consider a sequence s = Vq Vl ■ ■ ■ V m -i and s' = s[0..6 — 
l]s[m — b..m — 1] where b is the greatest integer in {1,2, . . . , [ifj} such that 
all variables of s' are fixed but exactly one single variable, for instance V. Let 



v £ dom(V ) and p the period of s' when V is fixed to v. If 



■ p + m — 2 • b 



strictly less than the minimum value of dom(P) then V can’t take value v. 
Proof. If the period of s' is p then Proposition 6 tells us that the period of s is 



less than or equal to 



• p + to — 2 • b is strictly 



• p + m — 2 • b. Therefore, if 

less than the minimum value of dom(P), s has a period not in dom(P). So to 
avoid it, v has to be removed from dom(V ). □ 



Example 13. Consider the sequence s = 1236*91 Vt- 3 where Vy corresponds 
to a not yet fixed domain variable. If the period of s is strictly greater than 
6 (i.e. min(dom(P)) > 6) then Vy should not be fixed to value 2. Otherwise s 
would have a border 6=123 and therefore a smallest period less than or equal 
to 6, which contradict the hypothesis. 



Remark 10. Remark 5 also holds for Proposition 13. 



We conclude this section with a remark concerning the implementation of 
the propositions introduced in this paper. We apply in priority Propositions 1, 
2, 3, 4, 6, 9 and 12 where only fixed variables of the sequence s are considered. 
These propositions are used each time a new variable of s is fixed. Then, we 
apply Propositions 7 and 13 on the prefix and the suffix of s where only one 
single non- fixed variable is considered. Concerning Propositions 5, 8, 10 and 11, 
they might require an important number of intersections or unions of domains 
and it is still an open question how to use them in an efficient way. 



6 Conclusion 

We have revisited the classical notion of period of a sequence from a constraint 
point of view. We have extended this notion in order to handle any comparison 
condition between two characters. Finally, for each proposition Pr, we have 
systematically shown how to extend it (when possible) or how to characterise a 
required property on the comparison condition for using Pr. 

From a broader perspective we hope that the paper will awake the interest for 
turning other classical problems from the area of combinatorial pattern match- 
ing, such as distance between sequences or local periods, to new constraints. In 
fact, it is often possible to turn to a constraint an algorithm which computes a 
result from an input. This is achieved by breaking the distinction between input 
and output parameters. 
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Abstract. Arc-Consistency (AC) techniques have been used extensively in the 
study of Constraint Satisfaction Problems (CSP). These techniques are used to 
simplify the CSP before or during the search for its solutions. Some of the most 
efficient algorithms for AC computation are AC6++ and AC-7. The novelty of 
these algorithms is that they satisfy the so-called four desirable properties for 
AC computation. The main purpose of these interesting properties is to reduce 
as far as possible the number of constraint checks during AC computation while 
keeping a reasonable space complexity. In this paper we prove that, despite pro- 
viding a remarkable reduction in the number of constraint checks, the four de- 
sirable properties do not guarantee a minimal number of constraint checks. We 
therefore refute the minimality claim in the paper introducing these properties. 
Furthermore, we propose a new desirable property for AC computation and ex- 
tend AC 6++ and AC-7 to consider such a property. We show theoretically and 
experimentally that the new property provides a further substantial reduction in 
the number of constraint checks. 



1 Introduction 

Constraint satisfaction problems (CSP) occur widely in engineering, science and the 
arts. Applications are frequently reported in production planning, resource allocation 
[BLN01], music composition [AADR98], Verification [EM97], Security [BB01] Bioin- 
formatics [GW94] and many others. In fact, a CSP is any problem that can be expressed 
as that of finding, from a finite set of possibilities, a collection of values satisfying 
some given particular properties. These properties are represented by relations called 
constraints. 

In its general setting the constraint satisfaction problem has been proved to be NP- 
complete. Nevertheless, in many real world instances a solution can be found with rea- 
sonable time and space efficiency when appropriate techniques are applied. The most 
frequently used are so-called consistency techniques. The main idea in these techniques 
is to use constraints not only to test the validity of a solution but as a sort of devices 
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for detecting inconsistencies and for pruning from the original set of possibilities some 
values that cannot appear in a solution. The reduced CSP taking into account only the 
remaining values is said to satisfy a given (weak) notion of consistency. One such no- 
tion is arc consistency. Consistency procedures are usually invoked repeatedly during 
search so it is very important to have efficient consistency algorithms. Even savings by 
a constant factor can have important overall performance consequences in some situa- 
tions. 

Finding better arc consistency algorithms has thus been an ongoing research topic 
for more than two decades. Building from algorithm AC-3 [Mac77], improvements 
such as AC-4 [MH86], AC-6 [Bes94], AC6++, AC-7 (see [BR99]) have been proposed. 
The standard way to compare arc consistency algorithms is by the number of constraint 
checks they perform. In [BR95] the so-called four desirable properties (FDP) for AC 
algorithms were defined and shown to provide a remarkable reduction in the number of 
constraint checks. Moreover, in [BR95] it is claimed that algorithms (such as AC6++ 
and AC-7) satisfying the FDP are optimal in the number of constraint checks. 

Our contributions are the following: we show that even when complying with the 
FDP an AC algorithm can still perform unnecessary constraint checks (e.g., AC-6++ 
and AC-7). We thus refute the above optimality claim. We prove that there is a family 
of CSP’s for which these unnecessary constraint checks can be rather significant. We 
also define a new property and show how AC algorithms satisfying it can avoid those 
redundant checks. This property is parameterized in a set of inference rules. We give 
two such rules and show their validity. We give a general AC algorithm taking into 
account the new property and show its correctness. We then use it to orthogonally extend 
AC-6++ and AC-7 into algorithms maintaining the new property and show how they 
improve over the originals in some benchmark and randomly generated problems. 

Recently, [vD02] has proposed a particular constraint processing ordering heuristic 
that can lead to savings of constraint checks similar to ours. Our idea is independent of 
constraint ordering and so leaves more room to variations in constraint ordering heuris- 
tics. This is important theoretically because the optimality claim for FDP compliant 
algorithms is wrt to analysis that assume the same particular constraint ordering. It is 
important in practice because a particular ordering may encode useful knowledge about 
the problem domain. On the other hand, efficient implementations of our idea seem to 
require particular value orderings, so it may leave less room to value ordering heuristics. 

2 CSP and AC: Concepts, Assumptions and Concerns 

A Constraint Satisfaction Problem (CSP) consists of a given finite set of variables 
with their corresponding finite domain of values, and a given set of constraints over 
the variables. The constraints specify allowed value assignments for the correspond- 
ing variables. A CSP solution is a value assignment for the variables satisfying all the 
constraints. Since CSP’s are NP-complete [GJ79], usually they are simplified by using 
pre-processing techniques, most notably Arc-Consistency (AC). This technique, also 
used during the search of CSP’s solutions, involves the removal of some values that 
cannot be in any solution. 

In AC we are only concerned with binary constraints, so we confine ourselves to 
CSP’s where all the constraints are binary relations; i.e., binary CSP's. 
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We can define a (binary) CSP as a tuple (V, D, C) where V = { x \ , . . . , x n } is a 
set of variables, D = {D 1, . . . , D n } is a set of domains with each Di specifying the 
domain of x, £ V, and C is a set of constraints Cij C D, x Dj. We define the predicate 
Cij(v, w) to be true iff (v, w) £ Cij. Without loss of generality, for each pair (. Xj , Xj) 
of variables in V, we assume that there is at most one constraint Cij £ C. A tuple 
(i>i, . . . , v n ) £ D i x ... x D n is a solution iff for each Cij £ C, Cij(vi, Vj). 

Example 1 . Let V = {x\, X2, £3}, D = {D\, D 2 , Df\ with D\ = D3 = { 1 , 2 } 
and D 2 = { 0 , 1 , 2 }. Define C12 = {(u,u>) £ D\ x D 2 \ v < w} and C23 = 
{(v,w) £ D 2 x D 3 I v < w\. Consider the CSP (V, D, {C12, C23}) . The tuples 
(1, 1, 1), (1, 1, 2), (1, 2, 2), (2, 2, 2) are solutions, but no tuple having 0 as its second 
component (i.e., of the form (_, 0, _)) can be a solution. □ 

2.1 Bidirectionality 

Let (V, l) , C) be a CSP. Notice that if C\j £ C, augmenting the CSP with a constraint 
Cji which is the converse of Gjj (i.e., Cji = C^ 1 = {(w,v) | (■ v,w ) £ Cij}) does 
not restrict any further the CSP, i.e., the CSP’s solutions remain the same. Intuitively, 
Cij and its converse C r , represent exactly the same constraint except that C, :) can be 
viewed as a constraint going from to Xj while Cj t as going from Xj to X{. The reader 
may care to augment the CSP’s constraints in Example 1 with the converses C21 and 
C32 and verify that the resulting CSP’s solutions are the same as the ones to the original 
CSP. 

If a CSP has a converse C r ; for each of its constraints G', ; then it is said to satisfy 
the bidirectionality property. Without loss of generality, we shall confine our attention 
to CSP’s satisfying the bidirectionality property as usually done for AC. 

2.2 Arc-Consistency and Viability 

As mentioned before, the idea behind AC computation is to eliminate from the domains 
of a given CSP some values that cannot be in any of its solutions. We say that such 
values are not viable. 

Definition 1 (Support and Viability). Let P = (V, I). C) be a CSP where D = 
{D \, . . . , D n }. Let D[ C D\ . . . D' n C D n . Suppose that Cij £ C, v £ D\ and 

w e D 'r 

We say that w is a support for v ( wrt Cij ) iff Cij (v, w). Also, we say that v is viable 
wrt Dj iff there exists a support for v in D }. Furthermore, we say that v is viable wrt 
D[ x ... x D' n iff for all Cik £ C, v is viable wrt D' k . 

Example 2 . Let P be the CSP (V, D, C ) with V and D as in Example 1 and C as the set 
containing the constraints C12 and C23 in Example 1 plus its converses C'21 and 632 ■ 
Notice that 2 is a support in D 2 for 1,2 £ D \ . Also notice that 0 <E D 2 is not viable 
wrt D 1, so it cannot be in any solution to P. We shall see that in AC computation, 
0 £ D 2 must be removed. □ 

The AC algorithms use a graph whose nodes and arcs correspond to the variables 
and constraints, respectively, of the input CSP. Given a CSP P = ( V,D,C ) , define 
Gp as the graph with nodes Nodes(Gp) = {i \ Xi £ V } and arcs Arcs(Gp) = 
{(id I Cij £ C}. Let us recall the definition of arc-consistency: 
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Definition 2 (AC Graphs). Let P = ( V . , D, C) be a CSP where D = { D \ , . . . , D n } 
and let D[ C D\ . . . D' n C D n . 

An arc ( i,j ) ;« Gp is said to be arc-consistent wrt D\ and Dj iff every v £ D\ is 
viable wrt D'. Also Gp is said to be arc-consistent wrt D[ x ... x D' n iff every arc 
( i,j ) in Gp is arc-consistent wrt D\ and D'-. 

Furthermore , we say that Gp is maximal arc-consistent wrt p = D[ x ... x D' n 
iff Gp is arc-consistent wrt p and there are no D'( D D \ , . . . , D" D D' n such that Gp 
is arc-consistent wrt D'[ x . . . x D ". 

Example 3. Let P = ( V , D, G) as in Example 2. Notice that Gp is not arc-consistent 
wrt DiX D 2 x D 3 but it is wrt 0x0x0. Verify that Gp is maximal arc-consistent wrt 
D\ x D' 2 x D 3 where D' 2 = D 2 — {0}. □ 

Computing Arc- Consistency 

Given a P = (V, I). C ) where D = { D \ , . . . , l) l: } , the outcome of an AC algorithm 
on input P, is a P' — ( V \ D ' , G) with D' = {1?^, . . . , D' n }, D' k C D k (1 < k < n) 
such that Gp is maximal arc-consistent wrt D\ x ... x D' n . 

Usually, an AC algorithm takes each arc (i,j) of Gp and removes from I), those 
values that are not viable wrt Dj (i.e., not having support in Dj). This may cause the 
viability of some values, previously supported by the removed ones from Di, to be 
checked again by the algorithm. 

Constraint Checks. The standard comparison measure for the various AC algorithms is 
the number of constraint checks performed (i.e., checking whether Cij(v, w) for some 
Cij and v £ Dj, w £ Dj) [Bes94, BR95, BFR95]. It has been shown analytically 
and experimentally [BFR95] that if we assume a large cost per constraint check or 
demonstrate large enough savings in the number of constraint checks, the constraint 
checks count will dominate overhead concerns. 

In the next section we shall see several properties aiming at reducing substantially 
the number of constraint checks from simple but useful observations. 

Domain Ordering P. Henceforth, we presuppose a total underlying order -< on the 
CSP’s domains as typically done for AC computation [Bes94, BR95,BFR95], In prac- 
tice, -< corresponds to the ordering on the data structure representing the domains. In 
our examples, we shall take -< to be the usual ’’less” relation < on the natural numbers. 

We can now recall the general notion of support lower-bound. Such a notion denotes 
a value before which no support for a given value can be found. 

Definition 3 (Support Lower-Bound). Let P = ( V , D, G) be a CSP where D = 
{D \, . . . , D„} and let D[ C D\ . . . D' n C D n . For all Cij £ C, the value w £ Dj is a 
support lower-bound in D' for v £ D[ iff for every w' £ Dj with w' -< w, Cij(v, w r ) 
does not hold. 

Example 4. Let P be as in Example 2. Assume that the total ordering ^ on the domains 
is <. Then 1 £ D 2 is a support lower-bound in D 2 for 1, 2 £ D\. □ 

(Notice that a support lower-bound for v is not necessarily a support of v.) 
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In the next sections, we shall see that simple and general notions, such as support 
lower-bound and bidirectionality (which are usually assumed in AC), can reduce sub- 
stantially the number of constraints checks. 

3 Four Desirable Properties of AC Computation 

Modern AC algorithms satisfy the so-called four desirable properties of a AC compu- 
tation given in [BR95, BFR95]. These are very simple and practical properties aiming 
at reducing the number of constraint checks while keeping a reasonable space complex- 
ity. In practice, algorithms satisfying these properties have shown to be very success- 
ful [BR95,BFR95], 

In the following we assume that D 1 , . . . , D n represent the current CSP domains of 
during AC computation. The desirable properties require (of an AC algorithm) that: 

1 . Cij ( v , w) should not be checked if there is w' still in Dj such that C-ij ( v , w ') was 
already successfully checked. 

2. Cij(v, w) should not be checked if there is w' still in Dj such that Cji(w' , v) was 
already successfully checked. 

3. Cij ( v , w) should not be checked if: 

a. Cij(v, w) was already succesfully or unsuccessfully checked, or 

b. Cji(w, v ) was already succesfully or unsuccessfully checked. 

4. The space complexity should be O(ed) where e, d are the cardinalities of the set of 
constraints and the largest domain, respectively, of the input CSR 

The properties can be justified as follows. An AC algorithm checks Cij ( v , w) when 
establishing the viability of v wrt Dj (i.e., the algorithm needs to find a support for v in 
Dj if any, otherwise it should remove v from Di). Now, the value v in (1) has already 
a support, i.e., it is viable, if such a w' still exists in Z?' ; so there is no need to check 
whether Cij(v,w). Property (2) can be explained similarly by using bidirectionality. 
Property 3. a states that there is no need of doing the same constraint check more than 
once, and 3.b states that, by bidirectionality, if we have checked Cji(w,v) then we 
already know the result of checking Cij(v, w ). Property (4) states a restriction on the 
space that can be used (see [BR95] for further details). 

The AC algorithm AC-3 does not satisfy Properties (1-3); AC-4 does not satisfy 
Properties l,2,3.b, and 4; AC-6 does not satisfy Properties 2 and 3b (the ones using 
bidirectionality); AC-Inference does not comply with Property 4. The modern algo- 
rithms AC6++ and AC-7 preserve the four properties. 

The AC6++ and AC-7 algorithms differ mainly in the order that values and arcs 
are considered during AC computation. The latter propagates the effects of removing a 
value as soon as possible (i.e., to reconsider the viability of the values supported by the 
removed one). In practice, this heuristic seems to save unnecessary constraint checks. 
Experimentally, AC-7 has shown to outperform AC6++. 

In [BR95] it is also claimed that the four desirable properties guarantee a minimal 
number of constraint checks. This claim is in the context of CSP’s where nothing is 
known about the particular semantics of the constraints and wrt the order in which val- 
ues, variables and arcs are considered during AC computation. Hence, AC6++ performs 
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a minimal number of constraint checks according to the order used by this algorithm, 
but still it may perform more constraint checks than AC-7 which uses a different order. 

The above four properties are of important practical significance. Nevertheless, we 
believe that they are not enough to guarantee the minimal number of constraint checks, 
thus contradicting the claim mentioned above. In the next section, we shall show that 
even when complying with the four desirable properties, an AC algorithm can still per- 
form a substantial number of unnecessary constraint checks. 

4 New Desirable Property and and Non-viability Deductions 

A drawback of the four desirable properties is that they allow checking Cij(v, w) even 
when the non- viability of v or w could have been deduced by using only the general no- 
tions of bidirectionality and support lower-bound, and information about previous cons- 
traint checks - i.e., without using any particular semantic properties of the constraints 
under consideration. In our view, the check of Cij(v,w ) under the above conditions 
would be unnecessary. 

Let us illustrate the above with the following example: 

Example 5. Let P be the CSP defined in Example 4. Suppose that during AC computa- 
tion an algorithm satisfying the four desirable properties checks, first of all, the viability 
of the values in D\ and immediately after the viability of the values in D 3 . Furthermore, 
suppose that the search for support in D 2 is done according to A. 

After establishing the viability of all the values in D\, the algorithm has checked 
that for every value v £ D 1 , C 12 (v, 0) does not hold. Furthemore, after establishing the 
viability of the values in D 3 , the algorithm has checked that for every value w £ D 3 , 
C 32 (w, 0) holds. 

Nevertheless, notice that for any w £ D 3 checking C 32 (w, 0) is really unnecessary, 
because after checking for the viability of the values in D\ one can deduce that 0 £ D 2 
is not viable. 

Here is a proof of the non-viability of 0 £ D 2 : Recall from Example 4 that 1 £ D 2 
is a support lower-bound in D 2 for 1,2 £ I)-\ . Now 0 1, so after checking for 

the viability of 1,2 £ D\, we can conclude from Definition 3 that ->Ci2(l,0) and 
-■(712(2, 0). By bidirectionality ~^C 2 i(0, 1) and - 1 C 21 (0, 2). Hence we can deduce, from 
Definition 1, that 0 £ D 2 is not viable. □ 

4.1 Unnecessary Constraint Checks 

One can verify that both AC6++ and AC-7 may indeed perform the unnecessary cons- 
traint checks mentioned in the above example. Also notice that the number of unneces- 
sary constraint checks in the above example is d = \D 3 \ . However, as shown below, one 
can generalize Example 5 to a family of CSP’s for which the numbers of unnecessary 
constraint checks is about ed 2 , where e is the number of constraints and d is the size of 
the largest domain. 

In the following theorem, by unnecessary constraint check we mean that the check 
can be avoided by using only bidirectionality, the notion of support lower-bound, and 
information about previous constraint checks. 
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Theorem 1. There is family of CSP’s for which the number of unnecessary constraint 
checks during AC computation, even when complying with the four desirable properties, 
can be 17 (ed 2 ), where e is the number of constraints and d the size of the largest domain. 

Proof (Outline). Let P = ( V. , D , C) where D = { D \ , . . . , D n } with all the domains 
being of a same even size. Decree that for any (i, j) £ Arcs(Gp), i < j, the first half 
of the values in Dj (according to the domain ordering A) are not viable wrt !), . 

Let us suppose that we have an AC algorithm satisfying the four desirable proper- 
ties. Assume that the algorithm checks first of all the viability of the values in Di wrt 
D n (i.e., it searches supports in D n for the values in Df), then D 2 wrt D n , . . . , D n _i 
wrt D n , and then D\ wrt l ) n --\ , D> wrt l) n -\ and so on. Furthermore, suppose that 
the search for support is done according to A. 

After establishing the viability of the values in D\, it is possible to deduce, by 
using the notion of bidirectionality and support lower-bound (as in Example 5), that 
the values in the first half of D n are not viable. Now, for each k = 2, . . . , n — 1, 
the four desirable properties do not prevent the algorithm from checking unnecessarily 
Ckn{v, w) for every v £ D *, and every w in the first half of D n . The same happens for 
k = 2, . . . , n — 2 wrt D n -\ . and so on. It then follows that the algorithm can perform 
17 (ed 2 ) unnecessary constraint checks. □ 

4.2 New Desirable Property 

In order to avoid unnecessary constraint checks of the kind above, we could suggest the 
following new desirable property: C,,- (v, w) should not be checked if it can be deduced 
via bi-directionality and the notion of support-lower bound that v or w is not viable. 
We shall use “deduce” in a loose sense of the word: We mean that one can conclude, 
without performing further constraint checks, that v (or w) is not viable. 

Nevertheless, there could be many other ways of deducing non-viability (e.g., spe- 
cial properties of constraints, domains, etc). Hence, we find it convenient to define the 
new desirable property wrt to fixed non-viability deduction system i.e, a set of infer- 
ence rules that allows us to deduce the non-viability of some values. We assume that 
whether a given value can be deduced in S as non-viable can always be decided. The 
fifth desirable property wrt a fixed S can be stated as follows: 

5. C'ij (v, w) should not be checked if it can be deduced, in the underlying non- 
viability inference system S, that v or w are not viable. 

Of course some deduction systems may be of little help. For example if S is the 
empty set of rules, in which case both AC-6++ and AC-7 would trivially satisfy the 
fifth property. Other example is a deduction system in which deciding the non-viability 
of a given value cannot be done with O(ed) in space - see the fourth desirable property. 
Next we give more helpful but general deduction systems (inference rules). 

Non-viability Deductions 

In the following properties, we give two simple and general inference rules for non- 
viability deduction to avoid unnecessary constraint checks of the kind illustrated in 
Example 5 and stated in Theorem 1 . 
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Property 1 ( Support LOWEST-Bound). Let P = ( V. I) . C) be a CSP where D = 
{D \, . . . , D n } and let D[ C D\ . . . D' n C D n . Suppose that SLBij is the least value 
(wrt -<) in a given set containing a support-lower bound in /J' for each v £ D\. The 
following is a valid non-viability inference rule: 

If w £ Dj and w -< SLBij then w is not viable wrt D\. 

Proof. By using the notions of support lower-bound (Definition 3) and bidirectionality. 

□ 

The above property says that a value can be deduced as non-viable if it is less than 
every support-lower bound for (all the values of) a given domain. The property can be 
implemented by using an array SLB such that SLB[i, j] keeps the least support-lower 
bound in Dj for the values in D^. We shall discuss this issue in Section 5.1. 

Property 2 (Support Upper-Bound Cardinality). Let P = (V. D . C) be a CSP where 
D = {D\, ... , D n } and let D\ C D\ . . . D' n C D n . Suppose that subij(v ) denotes 
an upper bound on the number of supports of v £ D' j in D'j . The following is a valid 
non- viability inference rule: 

If subij (i>) = 0 then v is not viable wrt D'j . 

Proof. Immediate □ 

We can implement the above property by having counters of the form subijfu ) 
initially set to Dj \ . Then counter subijfu) decreases each time a check of Cij(v, w) is 
found to be false, a support w' £ Dj for v is eliminated, or some value supported by v 
is eliminated. Once subij(v) = 0 we can proceed as if v did not exist in Di. We shall 
discuss this in Section 5.1. 

In the next sections we shall also illustrate experimentally that despite its simplic- 
ity, the above deduction rules indeed provide a substantial reduction in the number of 
constraint checks for CSP’s where nothing is known about the particular semantics of 
the constraints. 

5 AC Algorithms with Non-viability Deductions 

In this section we first present a new generic AC algorithm, here called AC[<S], which is 
parametric in an underlying non-viability deduction system S. The algorithm is based 
on AC-5 and it can be instantiated to produce other AC algorithms such as AC-4, AC-5, 
AC-3, AC6++ and AC-7. 

The generic AC algorithm removes the values deduced as being non-viable imme- 
diately. This can be justified as follows: If propagating the consequences of removing 
a value as soon as possible is a good heuristic (as shown by AC-7 [BFR95]) then it is 
reasonable to perform removals as soon as possible. The non-viability deductions can 
also help to detect promptly values that must be removed. 

In the following we assume that P = ( V. D , C) represents the CSP on input of 
which AC[<S] is to perform AC. Furthermore, we assume that D\ .. . . . D n represent the 
current CSP’s domains during the AC computation. 
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Most AC algorithms use a waiting list Q containing elements that have been re- 
moved and for which we need to propagate the effects of their elimination. In AC[<S], 
Q contains elements of the form (( i,j ), w ) , where (i,j) is an arc and w is value which 
has been removed from Dj , thus making us reconsider the viability of some values in 
Di supported by w. 

As AC-5, AC[<S] is parametric in the procedures ArcCons and LocalArcCons (Fig- 
ure 1 ) whose implementation can give rise to various AC algorithms. The procedure 
ArcCons (i,j,Aj,Aj) computes the set of values A, : C D, without support in Dj 
and the set of values Aj C Dj deduced, wrt S, as being non-viable. The proce- 
dure LocalArcCons (t, j, w, Aj, Aj) is similar except that it computes the set of values 
Aj C Dj without support in Dj which were previously supported by a value w removed 
from Dj . 



procedure ArcCons(in i,j. out Ai , Aj) 

Pre: (i,j) £ Arcs(Gp) 

Post: Ai = {v £ Di | Vu> € Dj : -> Cij(v,w)} 

Aj = {w £ Dj | P \~s w is not viable } 

procedure LocalArcCons (in i,j, w . out Ai, Aj) 

Pre: (i,j) £ Arcs(Gp) A w £ Dj 

Post: Ai = {v £ Di | Cij(v,w) AVu/ € Dj : -iCij(v,w')} 

Aj = {it/ £ Dj \ P I -5 w' is not viable } 

Fig. 1. The ArcCons and Local ArcCons Procedures. Notation P \~s E means that E can be 
deduced in the inference system S from the current information about P. 



The AC[5] algorithm (see Figure 2) has two phases. In the first one, called initial- 
ization phase (Lines 1-7), AC[<S] enforces each arc (i,j) to be arc-consistent wrt to the 
current Di and Dj. In the second one, called propagation phase (Lines 8-15), it propa- 
gates the effects of all the removed values. Notice that the removed values are put in Q 
and they stay in there until the effects of their elimination are propagated. 

The following theorem states that the outcome of AC[<S] on a CSP P = ( V , D, C) 
where D = {Dj,..., D n }, is a CSP P' = {V, D' , C) with D ’ = {D\ , . . . , D’ n }, 
D' k C D k (1 < k < n) such that Gp is maximal arc-consistent wrt D[ x ... x D' n . 

Theorem 2 (Correctness of AC[S]). The algorithm AC[ S ], Figure 2, is correct wrt its 
precondition and postcondition. 

Proof (Outline). Suppose that the AC[<S] algorithm runs on input P = ( V. C, D) with 
D = {Dj, . . . , D n {. Let pf = Dj f x ... x D nf be such that Gp is maximal arc- 
consistent wrt pf. 

Let l), u be the initial I), and I) lk with k > 0 be the current I), after the />:-th 
elimination of a A m (Lines 6-7 and 14-15) from some D mk l . Let p k = D\ k x ... x 
D nk . It is sufficient to prove that AC|<S] terminates with a final Pk ,k > 0, such that 

Pk = Pf- 

From the specification of ArcCons and LocalArcCons, it is easy to verify that any 
value is removed from Dj k l only if it is found non-viable wrt p k ~ 1 ; either it did not 
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Algorithm AC[S] (in-out P) 

Pre: P is a CSP ( V , D, C) with D = {D \, . . . , D n } 

Post: Gp 0 is maximal arc-consistent wrt D i x ... x D n 

1. Q<-0 

2 . for each (i,j) £ Arcs(Gp) do 

3. ArcCons (i,j,Ai,Aj) 

4. Q <— Q U {{{k, i), v) | (fc, i) € Arcs(Gp) A v £ Ai } 

5. Q <— Q U {((fe, j), w) | (fc, j) € Arcs(Gp) Aw € Aj} 

6 . Di <— Di — Ai 

7. Dj *- Dj - Aj 

8. while Q ^ 0 do 

9. choose (( i,j),w } € Q 

10. LocalArcCons(i, j, w, Ai, Aj) 

11. Q <- Q - {{(i, j),w)} 

12. Q <— Q U {((fc, i), -y) | ( k , i) G Arcs(Gp) A v £ Ai} 

13. Q 4 - Q U {((fc, j),iu) | (fe,j) € Arcs(Gp) Aw £ Aj} 

14. Di 4 - Di - Ai 

15. Dj 4 — D, — Aj 

Fig. 2. The generic AC[<S] algorithm. Notation Po denotes the CSP P when input to the algo- 
rithm. 



have a support or it was deduced in S as being non-viable. Now one can prove by 
induction on k that if v Dj k then v ^ Dj f . So, the first invariant of AC[5] is the 
following: 

Pf Q Pk Q Pk - 1 C ...Cpo. (1) 

Also, from the specification of ArcCons and LocalArcCons, one can verify that after 
the initialization phase (Lines 1-8) every value has a support in the current domains or 
in the waiting list Q. More precisely, let Val(Q) be the set of domain values appearing 
in Q\ during the propagation phase, the second invariant of AC[<S] is: 

V(i,j) e Gp,Vu £ D ik ,3w £ Dj k U Val(Q) : Cij(v,w). (2) 

The algorithm terminates when Q = 0. Hence, from Definition 2 and the second 
invariant, Gp is arc-consistent wrt the pk at termination time. Furthermore, from the 
first invariant we have pf C p k . It then follows that Gp is maximal arc-consistent wrt 
the p k at termination time, as wanted. 

It only remains to prove termination; i.e., that the propagation phase terminates 
(Lines 8-15). Observe that once an element (( i,j ), w) is taken from Q it is never put 
back in Q. In each iteration in the propagation phase, an element is taken from Q. 
Furthermore, there can be no more than ed elements in Q, where e = \C\ and d is the 
size of the largest domain. Hence, ed is an upper-bound on the number of iterations of 
the propagation phase. □ 

5.1 Implementation: AC6-3+ and AC-7+ 

We have implemented the non-viability inference rules given in Properties 1 and 2 for 
AC6++ and AC-7. We call AC6-3+ the algorithm that (orthogonally) extends AC6++ 
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with the support lowest- bound inference rule (Property 1) and AC-7+ the one that ex- 
tends (orthogonally) AC-7 with the support upper-bound cardinality rule (Property 2). 

The algorithm AC6-3+ has a three-dimensional array sib used exactly as in AC6++. 
Each entry slb[i,j, v] represents a support lower- bound for v £ Di in Dj - in fact the 
greatest one so far found by the algorithm; see [BR95] for more details. In addition, 
AC6-3+ has a two-dimensional array SLB. Each entry SLB[i,j] keeps the least of all 
slb[i,j,v] for all v £ Di. Justified by Property 1, every value in w £ Dj less than 
SLB [i, j] is removed from Dj before checking any constraint of the form C\j (u, w) or 
C k j(w,u). 

As for AC-7+, we use an additional three-dimensional array sub. Each array entry 
sub[i,j,v] represents a lower-bound on the number of supports in Dj for v £ Di. 
Initially, each sub[i,j,v] is set to \Dj\. Then the algorithm decreases sub[i,j,v] each 
time Cij ( v , w) is found to be false, a support w' £ Dj for v is eliminated, or some value 
supported by v is eliminated. Justified by Property 2, v is removed from I), whenever 
sub[i, j. v] becomes zero. 

Both extended algorithms have the same worst-case complexities of their predeces- 
sors AC6++ and AC-7. More precisely, both AC6++ and AC-7+ have 0(ed 2 ) worst- 
case time complexity and O(ed) worst-case space complexity, where e is the number 
of constraints and d the size of the largest domain [BR95, BFR95]. They also satisfy 
the four desirable properties as a result of being orthogonal extensions of AC6++ and 
AC-7. Furthermore, they satisfy the new desirable property wrt their underlying non- 
viability deduction rules, thus they can save some unnecessary constraint checks. In the 
next section, we shall show experimental evidence of these savings. 

6 Experimental Results 

Here we show some of our experimental results obtained from CSP’s typically used 
to compare AC algorithms [Bes94, BR95,BFR95]. We compared AC6++ vs AC6-3++ 
and AC-7 vs AC-7+ in benchmark CSP’s [Van89] as well as randomly generated CSP’s 
[Bes94, BR95, BFR95]. Each comparison was performed wrt fifty instances of each 
problem. 

For the ZEBRA problem [Van89] we obtained the following results in terms of 
constraint checks (ccs): 

AC6++ : 717 ccs AC-7 : 640 ccs 

AC6-3+ : 639 ccs AC-7+ : 594 ccs 

As for the combinatorial problem suggested in [Van89], we obtained: 

AC6++ : 977 ccs AC-7 : 966 ccs 

AC6-3+ : 783 ccs AC-7+ : 826 ccs 

For the randomly generated problems, following [Bes94, BR95, BFR95], we took 
the following as parameters of the generation: the number of variables, the size of the 
domains, the probability of having a constraint between any two variables, and the 
probability for any two values to be support of each other. In Figure 3 we show some 
results corresponding to the values of the parameters used in experiments of [BR95]. On 
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Fig. 3. AC6++ vs AC6-3+ (left) and AC-7 vs AC-7+ (right) on random generated problems with 
20 variables, at most 5 values per domain, and 30% probability of having a constraint between 
two variables. The horizontal axis represents the probability percentage that two values support 
each other. The vertical axis represents the number of constraint checks. 



average, we obtained that the reduction in the number of constraint checks by AC6-3+ 
and AC-7+ wrt AC6++ and AC-7 (respectively), was about 10%. We also observed that 
the numbers of values deduced as being non-viable was proportional to the reduction 
in the number of constraint checks. Moreover, even when the number of non- viability 
deductions was small, the number of constraint checks was significantly reduced. 



7 Concluding Remarks 

We have shown that, despite providing a remarkable reduction in the number of cons- 
traint checks, the four desirable properties of AC computation still allow a substantial 
number of unnecessary constraint checks - in the sense that the checks could have been 
avoided by deducing, only from general constraint properties and previous constraint- 
checks, the non-viability of some values. We also suggested a new desirable property 
which provides a further substantial reduction in the number of constraint checks. We 
modified some of the best known AC algorithms to satisfy the property and showed 
experimentally the benefits of the modified algorithms. 

Since the reduction in the number of constraint checks by the new property depends 
on the non- viability of values, we believe it is practical for problems with strong struc- 
tural properties (i.e., strong constrains, large domains, etc). As future work, we plan to 
identify and implement more inference rules to deduce non- viability efficiently. 
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Abstract. In computing circumscription by logic programming, circum- 
scription is usually transformed into some target logic program whose 
answer sets (or stable models) yield the Herbrand models of circum- 
scription. In this paper, we propose a new method of computing mod- 
els of prioritized circumscription in answer set programming, which is 
correct and more efficient than previous approaches. The basic idea of 
our approach is to transform a given circumscription into a general ex- 
tended disjunctive program whose answer sets (if exist) yield strictly pre- 
ferred models to a given candidate model with respect to the preorder 
< p > >p ’ z . Hence its inconsistency enables us to determine models 
of prioritized circumscription. Based on our new method, a circumscrip- 
tive model generator has already been implemented. Its performance for 
some interesting examples of circumscription is also addressed. 



1 Introduction 

Circumscription [14, 15] was proposed to formalize the commonsense reasoning 
under incomplete knowledge. So far many studies have been proposed to explore 
the approach of the use of logic programming for the automation of circumscrip- 
tion. Such approaches were based on the relationship between the semantics of 
circumscription and the semantics of the target logic programs. 

Gelfoncl and Lifschitz [8] was the first to propose a computational method 
for some restricted class of prioritized circumscription, which compiles circum- 
scriptive theories into stratified logic programs. Though their method is com- 
putationally efficient, the applicable class is too limited. Afterwards, Sakama 
and Inoue proposed two methods. The first one [19] is for a class of parallel cir- 
cumscription without function symbols, which compiles circumscription into a 
normal disjunctive program whose semantics is given by stable models, and the 
second one [20, 21] is for a class of parallel circumscription as well as prioritized 
circumscription without function symbols, which compiles circumscription into 
a prioritized logic program [20, 21] (or PLP, for short) whose semantics is given 
by preferred answer sets. However, only the semantic issues are given for the 
second one and the procedure to compute preferred answer sets of PLPs was left 
as their future works. Under such situations, Wakaki and Satolr [23] proposed a 
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method for a class of parallel circumscription as well as prioritized circumscrip- 
tion without function symbols, which compiles circumscription into an extended 
logic program whose semantics is given by answer sets. Their method enables to 
extend Sakama and Inoue’s first method to become applicable for a class of pri- 
oritized circumscription without function symbols. However, the number of rules 
becomes large when the input size grows. Another problem in both Sakama and 
Inoue’s first method [19] and Wakaki and Satoh’s one [23] is that they have to 
compute the characteristic clauses [10] of the given axiom set in advance, which 
must be computed elsewhere by some consequence-finding procedure instead of 
answer set programming. Recently, Wakaki et al. [24] proposed a procedure of 
computing preferred answer sets of Sakama and Inoue’s PLP in answer set pro- 
gramming (ASP, for short), one of whose purposes was to implement Sakama 
and Inoue’s second method to compute circumscription mentioned above. Imple- 
menting the procedure [24] , it is found that Sakama and Inoue’s second method 
[21, Theorems 3.8 and 3.9] (or [20, Lemma 3.7 and Theorem 3.8]) is not correct 
for computing models of circumscription. 

Thus, the motivation of this research is to develop a correct and efficient 
method to compute models of parallel and prioritized circumscription in ASP. 
To achieve the correctness, we will establish a new translation of prioritized cir- 
cumscription into a logic program, which is inspired by a translation technique 
used in [24]. Roughly speaking, the basic idea of our approach is to translate 
a given circumscription into a general extended disjunctive program whose an- 
swer sets (if exist) yield strictly preferred models for a given candidate model 
with respect to the preorder < pl >-> p Z _ Hence if it is inconsistent, we can de- 
cide that the candidate model is a model of prioritized circumscription. We give 
the soundness and completeness theorems for our method of computing models 
of circumscription. To gain the efficiency, we will not rely on any preprocess- 
ing other than ASP. Thus, the advantage of our new method over the previous 
methods by [19, 23] is that we do not have to compute the characteristic clauses, 
whose time complexity is exponential in the propositional case. As far as the au- 
thors know, our evaluation results show that our approach is more efficient than 
any other previous implemented procedures such as the method using integer 
programming [16]. 

The rest of this paper is structured as follows. In Section 2, we provide pre- 
liminaries. In Section 3, we present our translated logic program, soundness and 
completeness theorems for our method, the procedure of computing models of 
circumscription, and the experimental results under the current implementation 
of our procedure. We finish this paper by comparing our approach with related 
work in Section 4. 

2 Preliminaries 

We briefly review the basic notions used throughout this paper. 

2.1 Circumscription 

Let A(P , Z) be a sentence of a first order theory, P be a tuple of minimized predi- 
cates and Z be a tuple of variable predicates. Q denotes the rest of the predicates 
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occurring in A , called the fixed predicates. Then parallel circumscription of P 
for A(P, Z) with Z varied [14] is defined by a second order formula as follows: 

Circum(A; P ; Z) d = A{P , Z)A~<3pz(A(p, z)Ap < P). 

where p, z are tuples of variables which have the same arity to P, Z respectively. 
We write Circum(A\ P) when the last argument Z in CircumfA ; P; Z) is empty. 
If P is decomposed into disjoint parts P 1 , • • • , P k , and the members of P 1 are 
assigned a higher priority than the members of P J for i < j, then priori- 
tized circumscription of P 1 > • • • > P k for A with Z varied is denoted by 
Circum(A;P 1 > ••• > P k \Z), which is also defined by a second order for- 
mula. Due to the space limitation, its definition is omitted (see [14]). Prioritized 
circumscription for k = 1 coincides with parallel circumscription. 

Definition 1 Let P, Z, Q be tuples of minimized predicates, variable predicates 
and fixed predicates respectively. For a structure M, let |M| be its universe and 
M[itT] the interpretations of all individual, function, and predicate constants K 
in the language. For any two structures Mi, M 2 , we write Mi < P;Z M 2 if 

(i) IMiHMa], 

(ii) Mi IQ] = M 2 IQ], 

(hi) Mi IP] C M 2 [P], 

Definition 2 Let P 1 , • • • , P k be k disjoint parts of P in Definition 1. For any 
two structures Mi, M 2 , we write M\ < p > ' >p 'A M 2 if 



(i) \Mi\=\M 2 \, 

(ii) Mi IQ] = MalQ], 

(hi) a. MiJP 1 ] C M 2 IP 1 ], 

b. For every i < k, if for every 1 < j < i — 1, = M 2 \P^\, 

then Mi IP*] C M 2 {P% 



The preorder < p > ' >p A for k = 1 coincides with < P ’ Z . 

We write Mi < p > ' >p A M 2 if Mi < p > ' >p 'A M 2 and M 2 >'" >p Mi. 
With respect to < p > " >p < z , we say that Mi is preferred to Mi if 
Mi < pl > -> pk ’ Z m 2 , and M\ is strictly preferred to M\ if Mi < pl >---> pfc ;- z M 2 . 
We say that Mi and M 2 are tie if M\ < p >'" >p 'A M 2 and M 2 < p > ' >p > z Mi. 

Definition 3 With respect to < p > ' >p 'A a structure M is minimal in a 
class S of structures if M G S' and there is no structure M' G S such that 

M' < pl >-> pk ; z m. 

Proposition 1 A structure M is a model of Circumf A: P; Z) iff M is minimal 
in the class of models of A with respect to < P;Z [Iff. 

Proposition 2 A structure M is a model of Circum(A; P 1 > ••• > P k ;Z) iff 
M is minimal in the class of models of A with respect to < p > ' >p < z [If], 
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2.2 General Extended Disjunctive Programs 

A general extended disjunctive program (GEDP) [11] is a set of rules of the form: 

L i; • • • ; Lk\ notLk+v, • • • ; notLi 

Li+ 1 ■> • • • i Lm , notL m+ i,...,notL n , (1) 

where each Li (n > m > l > k > 0) is a classical literal, i.e. either an atom a 
or its negation ->a preceded by the classical negation sign and represents 
a disjunction. The rule with k = 0 is called an integrity constraint. A rule with 
variables stands for the set of its ground instances. 

The semantics of a GEDP is given by the answer sets [9, 11] as follows. 

Definition 4 Let Litp be the set of all ground literals in the language of P. 
First, let P be a noPfree GEDP (i.e., for each rule k = l,m= n). Then, S C Litp 
is an answer set of P if S is a minimal set satisfying the conditions: 



1. For each ground instance of a rule Pi;---;Pfc <— Li + i,...,L m in P, if 
{Li+i ,...,L m } C S, then Li £ S for some i (1 < i < l)', In particular, 
for each integrity constraint <— L \, . . . , L m in P, {L i, . . . , L m } <2 S holds; 

2. If S contains a pair of complementary literals, then S = Litp. 

Second, let P be any GEDP and S C Litp. The reduct of P by S is a noPfree 
GEDP P s obtained as follows: 

A rule Pi; • • • ; L k <— L i+1 , . . . , L m is in P s 
iff there is a ground rule of the form (1) in P such that {Pfc + i, . . . , L{\ C S and 
{L m + 1) • • • , L n } fl S = 0. 

Then, S is an answer set of P if S is an answer set of P s . An answer set is 
consistent if it is not Litp. The answer set Litp is said contradictory. A GEDP 
is consistent if it has a consistent answer set; otherwise, it is inconsistent. 



3 Computing Models of Circumscription 

In this section, we introduce a sound and complete procedure for computing 
models of parallel circumscription Circum{A\ P; Z) or prioritized circumscrip- 
tion Circum{A\ P l > ••• > P k \Z). We assume that A is a first order theory 
without function symbols and is given by a set of clauses of the form: 

Ar V • • • V A ( V -iBi V • • • V ~^B m , (2) 

where A; (1 < i < £) and B :j (1 < j < m ) are atoms. Also, we consider 
Herbrand models of A, which has the effect of introducing both the domain 
closure assumption (DCA) and the unique name assumption (UNA) into A [21]. 
We suppose that any clause in A with variables stands for the set of its ground 
instances expanded by individual constants occurring in DCA. Hereafter, let U 
be the Herbrand Universe of A, and H be the Herbrand base of A. 
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3.1 Translation for Generating Strictly Preferred Models 

We translate a parallel or prioritized circumscription into ASP in two steps. In 
the first step, given a parallel or prioritized circumscription, the set A of clauses 
is translated into the following GEDP in a similar way to [20, 21], each of whose 
answer sets is a Herbrand model of A. 

Definition 5 Given Circum(A; P 1 > ••• > P k -,Z) (or Circum(A\ P; Z) if 
k = 1), 77 is the GEDP defined as follows: 

(i) For any clause (2) in A, 77 has the rule: 

Ai ; • • • ; A( *— B\, ... , B m . ( 3 ) 

(ii) For any fixed or variable predicate /i in S , 77 has the rule: 

p.(x) ; not /x(x) . (4) 

Here x is the tuple of variables in each predicate. 

Definition 6 Let P, Z , Q be the same sets in Definition 1. For any two struc- 
tures Mi, M 2 , we write Mi < p M 2 if 

(i) Mi < p ’ z M 2 , 

(ii) MifZj = M 2 \Z\. 

Then with respect to a logic program 77, the following Theorem 1 holds. 

Theorem 1. Let 77 be the GEDP as in Definition 5. Then, (i) M is an answer 
set of 77 if and only if M is a minimal Herbrand model with respect to < p in 
the set of Herbrand models of A. (ii) In case of Z = 0, AI is an answer set of 
77 if and only if AI is a model of Circum{A\ P) . 

Proof: See Appendix. 

Having the GEDP 77, the next step determines the models of the given parallel 
or prioritized circumscription based on the order < P; ' Z (or < p >' >p • z 'j on the 
set of answer sets of 77. To this end, we constructs a logic program T[77; P; Z\ AI] 
(or T[77; P 1 , . . . , P , Z\ A7]) from the given parallel or prioritized circumscription 
and any answer set AI of 77. The proposed translation here is inspired by the idea 
in Wakaki et alls method [24] to compute preferred answer sets of PLPs [20,21]. 
The construction of the logic program P[77; P 1 , . . . , P k - Z\ 717] (or T[77; P; Z\ AI] 
for k = 1) is based on the techniques of renaming atoms and meta-programming 
as follows. 

First, we encode the given answer set AI of 77 as well as some other answer 
set M' of 77 within an answer set of T[77; P 1 , . . . , P fc ; Z\ M). This encoding 
enables us to check whether M' < p > ' >p Z AI holds or not. To this end, we 
use renaming atoms such that each atom L £ 77 in 717 is renamed by the newly 
introduced atom L* . Then, this technique symbolically enables us to embed the 
given answer set AI as the set 717* of renamed atoms 7*, together with another 
answer set M' within an answer set of T[77; P 1 , . . . , P k \ Z\ 717]. 
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Second, we use a meta-programming technique in order to decide 

M' < pl > -> pk ’ z M 

with respect to two answer sets 717 and M' of 77, where a comparison between 
an atom a £ M and some atom b £ M' is required. For such a comparison in our 
approach, it should be symbolically known that the newly introduced symbol L* 
is the renamed atom for the corresponding atom L £ 77. Therefore, not only we 
introduce a new constant L t for each atom L £ 77 as well as its renamed atom 
L* , but also we provide predicate symbols m.\ and m 2 such that, with respect 
to any constant Lt corresponding to an atom L £ 77 as well as its renamed 
atom 77*, means L £ M for the given answer set 717, while 7712 ( 7 ^) means 

L £ M 1 for another answer set M' . 

In the following, the set of renamed atoms L* s is defined as 77*, and the set 
of newly introduced constants L t s is defined as C. 

Definition 7 77* and C are defined as follows. 

77* d = {L* \L£ 77} 

C = f {L t | L £ 77} 

where 77 is finite, so are 77* and C. 

3.1.1 Parallel Circumscription 

We are ready to show the transformed logic program T[77; P; Z\ M] for parallel 
circumscription as follows. 

Definition 8 Let 77 be the GEDP constructed from Circum{A\ P; Z) according 
to Definition 5, and 717 be an answer set of 77. Then T[LI; P; Z ; 717] is the GEDP 
defined as: 

T[77; P; Z: 717] = f 77 U P U P. 

Here, P is the set of domain dependent rules as follows: 

1. 77* , for each L £ 717, 

where each L* £ 77* is the renamed atom corresponding to L £ 7V7, 

2. mi(L t ) <— L* , m 2 {L t ) <— 77, 

for every L £ 77, its renamed atom L* £ 77* and the ground term L t £ C 
expressing the atom L, 

3. minp(Lt) <— , 

for ground term L t £ C representing the atom p(t) £ 77 whose predicate 
symbol p is in P, and 

4. f ixed(Lt) , 

for every term L t £ C representing the atom q(t) £ 77 whose predicate 
symbol q is in Q, 
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where minp and fixed are newly introduced predicate symbols, and t occurring 
in items 3 and 4 is a tuple of elements in U for each predicate. S’ is the set of 
domain independent rules instantiated over constants in C as follows: 

5. h\ <— fixed(x), not mi(x), 

hi <— m.i(x), fixed(x), not m\{x), 
h <— not hi, not hi, 

6. c<—mi(x), minp(x), not mi(x), 
d <— rni(x), minp(x), not mi(x), 

7. subsetp <— h, c, not d , 

8. <— not subsetp, 

where hi, hi, h, d, subsetp are newly introduced propositional symbols. Rules 
of items 5, 6 and 7 check M' < P;Z M using the conditions (ii), (iii) in Definition 

I where Mi and Mi are regarded as some answer set M' and the given answer 
set M in this case, respectively. Rules of item 5 mean that h becomes true if 
M' [Q] = MJQ]. Rules of item 6 mean that c (or d) becomes true if there is some 
atom a £ M \ M' (or a £ M' \M) whose predicate symbol is in P. Thus, Item 
7 means that, with respect to the given answer set M , subsetp becomes true for 
any answer set M' such that M' < P;Z M. Item 8 means that the inconsistency 
is derived if subsetp is not true. 

The following theorems hold for the translated logic program T[II\ P\ Z\ M], 
whose proofs are omitted since they are special case (i.e. k = 1 ) for proofs of 
Theorem 4 and Theorem 5 given in Appendix. 

Theorem 2. Given Circum(A- P- Z), let M be an answer set of 77 . 

7/T[77; P; Z- M] is consistent, then for any answer set E ofT[n-,P\Z\ M ] , there 
is an answer set M' of 77 such that M' < P;Z M where M' = EtlH. Conversely, 
if there is an answer set M' of 77 such that M' < P;Z M , T[77; P; Z\ M] is 
consistent and has an answer set E such that E n H = M' . 

Corollary 1 Given CircumfA- P- Z), let M be an answer set of II. Then, 
T[II\P\Z\M] is inconsistent if and only if there is no answer set M' of II 
such that M' < P ’’ Z M . 

Proof: This is immediately proved as the contrapositive of Theorem 2. 
Theorem 3. (Soundness and Completeness Theorem) 

Given Circum(A ; P; Z), T[P; P; Z\ M] is inconsistent for an answer set M of 

II if and only if M is a model of Circum(A-, P; Z). 

Example 1. Let us consider parallel circumscription Circum(A; abi, abi\p) 1 
where P = {a&i,a& 2 }, Q = {</, r}, Z = {p}. A is given by the set of clauses as 
follows: 



{->p(a;) V abl(x) V ~<r(x), p(x) V ab2(x) V -<q(x), r(n), q(n)} 



1 This is the well-known Quaker and Republican’s problem. 
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where x is an individual variable and n is a constant. Then the GEDP 77 is 
constructed from Circum(A- ab\ : ab 2 \p) as follows: 

77: ab\(x) <— p(x), r(x), p(x) ; ab 2 (x) <— q(x), 

r(n) q(n) p(x) ; not p{ x) 

g(x) ; not g(a:) <— , r(x) ; not r(x) <— . 

77 has the following two answer sets Mi, M2: 

Mi = {r(n), q(n), ab 2 (n)}, M 2 = (r(n), g(n),p(n), a&i(n)}. 

Corresponding to the Herbrand base 77 = {p(n), g(n), r(n), afei(n), a6 2 (n)}, 
we prepare 77* and C as follows. 

77* = {p*(?r), g*(n), r*(?t), a6i*(n), a6 2 *(n)}, C = {pn,qn,rn,abln,ab2n}. 

Then for Mi, T[77; P; Z; Mi] = 77 U 7\ U E is constructed with the following 7\: 

A : r*(n) g*(n) 4-, ab 2 *(n) nn(jpn) <- p*(n), 

m\(qn) <— q*(n), m,\(rri) <— r*(ri), mi(afeln) 4— abi*(n), 
m\{ab2n) <— afe 2 *(n), m 2 (pn) 4— p(n), m 2 (gn) 4— q(n), 
m 2 (rn) <— r(n), ui 2 (a61n) <— ab±(n), m 2 (a62n) <— ab 2 (n), 

minp(abln) <— , minp(ab2n) <— , fixed(qn) <— , fixedirn ) <— . 

In this case, both T[77; P\ Z\ Mi] and T[77; P; Z; M 2 ] are inconsistent. Therefore, 
according to Theorem 3, we can conclude that both Mi and M 2 are models of 
Circum(A\ abi,ab 2 ',p)- 

3.1.2 Prioritized Circumscription 

The translated logic program T[77; P 1 , . . . , P fc ; 21; M] for prioritized circumscrip- 
tion is shown as follows. 

Definition 9 Suppose a tuple P of predicates is decomposed into k disjoint 
parts P 1 , . . . , P fc . Let 77 be the GEDP constructed from Circum{A\ P 1 > • • • > 
P fc ; Z) according to Definition 5, and M be an answer set of 77. 

Then P[77; P 1 , . . . , P fc ; Z; M] is the GEDP defined as: 

T[77; P 1 , . . . , P fc ; Z; M] 77 U P U E. 

Here, P is the set of domain dependent rules, which has not only the same rules 
of items 1, 2, 4 from P given in Definition 8 but also rules of item 3 as follows: 

3. minpi(Lt) , 

for every term L t £ C representing the atom p(t) £ 77 whose predicate 
p is in P l (1 < i < k) 

where each minpi (1 < i < k) is a newly introduced predicate symbol. E is the 
set of domain independent rules instantiated over constants in C as follows: 
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5. hi mi(x), fixed(x), not 717 . 2 ( 2 ;), 
hi <— 717 . 2 ( 2 ;), fixed(x), not 777 . 1 ( 2 ;), 
h not hi, not hi, 

6. Ci <— 717 . 1 ( 2 ;), minpi(x), not 777 . 2 ( 2 ;), 
di <— 7712 ( 2 ;), minpi{x), not 7711 ( 2 ;), 

egj <— not di, not . ( 1 < i < k) 

7. smalleri <— h, ci,not d\, 

smalleri <— h, eqi , . . . , eqi-i, Ci, not di, (2 < i < k) 

subsetp <— smalleri, ( 1 < i < k) 

8. <— 77ot subsetp, 

where fti, hi, h, di, Ci, eqi, smalleri and subsetp are propositional symbols. 
Rules of items 5, 6 and 7 check M' < p > " >p ’ Z M using the conditions (ii), (iii) 
in Definition 2. Rules of item 7 means that, with respect to the given answer set 
M, subsetp becomes true for any answer set M' such that M' < p >' >p < z M. 
Thus due to item 8, the inconsistency is derived if subsetp is not true. 

With respect to the translated logic program T[77; P 1 , . . . , P fc ; Z; M] for priori- 
tized circumscription, the following theorems hold. 

Theorem 4. Given Circum{A\P 1 > ••• > P k \Z), let M be an answer set 
of II. If T[77; P 1 , . . . , P k \ Z\ M] is consistent, then for any answer set E of 
T[JT; P\...,P k -,Z-M], there is an answer set M' of II such that M 1 < pl> '" >pk ’ z 
M where M' = E n H . Conversely, if there is an answer set M' of 77 such that 
M' < pl >-~> pk Z M, T[II; P l , . . . , P k ; Z; M] is consistent and has an answer 
set E such that E n H = M' . 

Proof: See Appendix. 

Corollary 2 Given Circum(A\ P 1 > ••• > P k \Z), let M be an answer set of 
77. Then, T[77; P 1 , . . . , P k \ Z\ M] is inconsistent if and only if there is no answer 
set M' of 77 such that M' < p > - >p < z M . 

Proof: This is immediately proved as the contrapositive of Theorem 4- 
Theorem 5. (Soundness and Completeness Theorem) 

Given Circum(A; P 1 > • • • > P fc ; Z), T[77; P 1 , . . . , P k ; Z; 717] is inconsistent for 
an answer set M of 77 if and only if M is a model of CircumfA-, P 1 > • • • > 
P k -Z). 

Proof: See Appendix. 

Example 2. Consider prioritized circumscription CircumfA-, abi > abi\p). A is 
the same as in Example 1, and let P 1 = {a&i}, P 2 = {abi}. Then for the answer 
set Mi, T[77; P 1 , P 2 ; Z\ M{\ = 77 U P( U E is constructed with P 2 as follows: 

P 2 = {Pi \ {minp(abm) , minp(ab 2 n) <— }} U {minp 1 (abin)<—, minp 2 (ab 2 n) <—} 

where Mi and Pi are given in Example 1. In this case, P[77; P 1 , P 2 ; Z\ Mf\ is 
inconsistent, but T[77; P 1 , P 2 ; Z\ Mi] is consistent. Thus we can conclude that 
only Mi is a model of Circum(A ; abi > abi\p) by Theorem 5. 
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3.2 Procedure to Compute Circumscriptive Models 

We show the sound and complete procedure CircModelsGenerator which gen- 
erates all models of the given circumscription based on Theorem 3 and The- 
orem 5. In order to decide the minimal ones among the answer sets of II, 
the procedure makes use of the inconsistency checking of the logic program 
T[7T;P 1 , . . . , P k \ Z\ M] whose answer sets (if exist) yield strictly preferred mod- 
els to M with respect to < p > ' >p :Z . Surprisingly our procedure is very simple 
and elegant though its encoding has a guess and check structure as follows. 

Procedure 1 CircModelsGenerator (A] P\ • • • , P k ; Z\ A) 

Input : Circum{A\ P 1 > ■ ■ ■ > P k \ Z) (or Circum{A\ P\ Z) if k = 1) 

Output : the set A of all models of Circum{A\ P 1 > ■ ■ ■ > P k ; Z) 

1. Generate the GEDP II from A, Z, Q according to Definition 5, and compute 
the set AS of all answer sets of 77. A := 0. 

2. If AS is the empty set 0, then A := AS, return A. 

3. For any answer set S £ AS, if T[I7; P 1 , . . . , P fc ; Z\ 5] is inconsistent, 
then A := A U {5}. 

4. return A. 

3.3 Implementation and Experimental Results 

The procedure CircModelsGenerator presented in Section 3.2 has been imple- 
mented using the ASP solver dlv [6] and the C++ programming language. The 
current implementation is running under the Linux and Windows operating sys- 
tems. Some of experimental results and performance under the Linux on a 2.5 
GHz Pentium IV computer are shown as follows. 

Example 3. In Examples 6 and 12 in [16], parallel circumscription Circum(A; 
p, q ; r) and prioritized circumscription Circum(A ; p,r > q) are respectively 
given, where A is the set of clauses as follows 2 : 

{ P(a) V q(a) V r(a), r(a) D r(b), p(h) V q(h), r(c ) V p(c), p(c ) D r(d)j 
Our implementation computes correctly 4 models w.r.t. Circum{A\ p, q ; r) in 4 
msec and 2 models w.r.t. Circuin{A\ p,r>q) in 5 msec. These are average CPU 
times repeated 20 times to eliminate inaccuracies caused by the system clock. 

Example 4- Circum(A ; ei>e2>e3>c; p, v, m) expresses the meeting schedul- 
ing problem in [22] where A is the set of clauses as follows: 

{ c(l) V c(2) V c(3), p(x) D c(x), v(x) D c(x), m(x) D c{x), 
c( x) A D p(x), c( x) A ~>e 2 (x) D v(x), c(x ) A ^3(2:) D ?n(x), 

^P(I), ^v(3), -<?n(2)} 

2 These are counter-examples to Theorem 3.8 and Theorem 3.9 in [21] (or Lemma 
3.7 and Theorem 3.8 in [20]) respectively. In fact, 30 models are computed for 
Circum{A\ p, q; r) according to their Theorem 3.8 in [21] (or Lemma 3.7 in [20] ). 
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It has the unique model M\ = { c( 2), 63 ( 2 ), p( 2), u(2)}, which is computed 
correctly in 365 msec by our current implementation. On the other hand, pri- 
oritized circumscription: Circum(A\ ei>e 2 , e 3 > c; p , v, m) has two models 
Mi and M 2 = { c(3), e 2 (3), p(3), m(3)}, which are also computed in 383 msec 
by the current implementation. These are also average CPU times repeated 10 
times. 



4 Related Work and Conclusion 

We presented a new method of compiling parallel and prioritized circumscription 
without function symbols into answer set programming. In computing circum- 
scription by logic programming, almost all previous methods [8, 20, 23, 21] trans- 
late a given circumscription into some semantically equivalent logic program. 
In our new approach, a given circumscription is translated into a logic program 
whose answer sets (if exist) yield strictly preferred models to a given candidate 
model. Hence its inconsistency enables us to determine models of prioritized cir- 
cumscription. With respect to complexity, it is well known that model checking 
of circumscription is coNP-complete [3] and literal entailment from circumscrip- 
tion is Uf-complete [5]. As for the proposed procedure, it calls an ASP solver 
polynomial order times. Our procedure has been implemented using ASP solver. 
Under the current implementation, models of circumscription are correctly com- 
puted. Our performance presented in Section 3.3 is faster than or comparable 
with that of figures (30 ~ 420 msec) shown in [16, p.75] which are taken to com- 
pute minimal models using integer programming. To the best of our knowledge, 
there are not enough benchmarks for the implemented systems to compute pri- 
oritized circumscription. Thus it may be concluded that our approach is more 
efficient than any other previous implemented methods of computing prioritized 
circumscription . 

Our approach of computing circumscriptive models is inspired by [24]. It 
turns out that Janlrunen and Oikarinen’s approach for testing the equivalence of 
logic programs [13, 17] is based on a similar technique. Their approach and ours 
construct the translated logic programs for the respective purposes using tech- 
nique of renaming atoms. More precisely speaking, answer sets of our translated 
logic program yield the strictly preferred models with respect to the preorder 
< p > - >p 'A used to check the minimality for each candidate Herbrand model, 
whereas answer sets of their translated logic program yield the counter-examples 
used to check the equivalence of two logic programs. Thus, our encoding tech- 
nique may become one of general techniques or frameworks using ASP for prob- 
lem solving whose complexity is in the second level of the polynomial hierarchy. 

Using a different preference ordering, Brewka et al. [1,2] proposed simi- 
lar approaches to ours to compute preferred models of logic programs with 
ordered disjunction [1] and for answer set optimization [2]. Their both ap- 
proaches use two programs, a generator and a tester which correspond to 77 
and T[77; P 1 , . . . , P k \ Z\ M] respectively for our approach. However, their imple- 
mentation is different from ours since their generator and the tester are run in 
an interleaved fashion which is similar technique used in Janlrunen et al. [12]. 
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Eiter et al. [7] explore the implementation of preference semantics for logic 
programs by means of meta-programming in ASP. Their approach enables to 
compute preferred answer sets for three different kinds of preferential ASPs 
including Delgrande et al.' s approach [4]. Their basic meta-interpreter Pj has 
the “guess and check” structure, which is similar to our procedure. Our approach 
also uses the technique of the meta-programming, but its idea and aim are 
different from those of Eiter et al.' s meta-programming. 

Finally, in future work, we want to to improve the procedure presented in 
this paper to enhance the efficiency by pruning the search space based on the 
similar intelligent backtracking technique proposed by [12,18]. 
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Appendix: Proofs of Theorems 

Proof of Theorem 1 

Proof: (i) Let V, Q , Z be disjoint subsets of the Herbrand base 77 of A such that 
V, Q, Z are sets of ground instances for minimized predicates, fixed predicates, 
and variable predicates respectively. It is obvious that any answer set 717 of 77 
is a Herbrand model of A since M satisfies the rules of the form (3) in 77. In 
addition, any answer set 717 such that S C 717 for any set S £ 2® u2: satisfies all 
rules of the form (4) in 77. Then, 

M is an answer set of 77 
iff 717 is an answer set of 77 M 

iff 717 is a minimal Herbrand model of A U {/i(x) ; not it(x) <— } M 

iff 717 is a minimal Herbrand model of A U {M fl (Q U Z)} 

iff TV/ is minimal with respect to the set inclusion (C) in the set of Herbrand 

models of A each of whose extension of Q U Z coincides with 717 fl ( Q U Z) . 
iff 717 is minimal with respect to < p in the set of all Herbrand models of A. 

(ii) It is obvious due to the result of (i). □ 

Proof of Theorem 4 Let V 1 , . . . ,V k be disjoint subsets of V, each of which is 
a set of atoms with the predicate from P l (1 < i < k). 
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Proof: (=>) Since T[77; P 1 , . . . , P k \ Z\ M] = II U P U E is consistent, it holds 
that subsetp € E for any answer set E of T[i7; P 1 , . . . , P fc ; Z\ M], and E is 
also an answer set of T[77; P 1 , . . . , P k \ Z\ M] \ {<— not subsetp}. Then, since 
P US \ {<— not subsetp} is a stratified logic program whose each rule has the 
ground head atom not occurring in H, E is an augmented answer set of 77 which 
not only includes some answer set of II but also has ground head atoms of the 
rules in P U E \ {^— not subsetp}. Thus M' — E n H should be an answer set 
of II. According to item 1 in Definition 8, it is obvious that M* = E C 1 H* is 
a renamed answer set of a given answer set M such that each L* £ M* is a 
renamed literal w.r.t L £ M. Then, according to item 2, 

mi(at) £ E iff a* £ M* (i.e. a £ M) 
m 2 (b t ) £ E iff b£M' = EC\H 

where at £ C, a* £ H* w.r.t a £ H and bt £ C w.r.t b £ H. Thus, according to rul- 
es in Definition 9, it follows that, for any answer set E of T[II\P l , . . . , P k -,Z-,AI], 
s ubsetp £ E 

iff smallert £ E for some i ( 1 < i < k) 

iff h £ E, eq.j £ E, Ci £ E and dt ^ E for 1 < j < i — 1 and 1 < 3i < k 

iff hi £ E, h 2 fL E, eqi £ E , . . . , eqt-\ £ E a £ E and di fLE 
iff -i ( 3 at £ C s.t. mi(at) £ E A f ixed(at) £ E A m 2 (ai)£E) A 
-> ( 3 bt £ C s.t. m 2 {bt) £ E A fixed(bt) £ E A m\(bt)£E)A 
Aj = i{ _ '(3e( £ Cs.t. mi(e|) £ E A minpj(e{) £ E A m 2 (e{)£E)A 
r*(3 g{ £ Cs.t. ni 2 (gi) £ E A minpj(g{) £ E A mi(g{ )$E)}A 
-* ( 3 ut £ C s.t. ni 2 (ut) £ E A minpi(ut) £ E A mi(ut)$.E)A 
( 3 vt £ C s.t. m\(vt) £ E A minpi(vt ) £ E A m 2 (vt)^E) 
iff -i ( 3a £ Q s.t. a £ M A afLM'} A -> ( 3b £ Q s.t. b £ M' A b^M) A 

Api{^ ( 3e^' £ V j s.t. e j £M\ M') A ^ ( 3 gi £ V j s.t. g j £ M' \ M)} 

A— i ( 3 u £ P 4 s.t. u £ M' \ M) A ( 3u £ V 1 s.t. v £ M\ M') 

iff (AT |Q] = M\Q\) A Ap=\(M'i pj l = A (M'[P l ] c Af[P*]) 

iff M' < pl >-> pk 'Z M where A I' = E C\ H. 

(<=) Suppose that there is an answer set M' of II such that A I' < pl > -> pk ; z 
M. Now, we define a set r as follows: 

r d = T[II ; P 1 , . . . , P k ; Z\ M] \ not subsetp} 

It is easily shown that, not only any answer set F of r is an augmented answer 
set of II such that N = F n H for some answer set N of II but also for any 
answer set N of II, there is some answer set P of r such that F (1 H = N. 
According to the assumption, M' is also an answer set of II. Therefore, there 
should be some answer set F' of r such that F' n H = M' . With respect to F' , 
it is obvious that M* = F' fl H* is a renamed answer set of a given answer set 
M according to item 1 in Definition 8, and due to item 2, 

mi(at) £ F' iff a* £ M* (i.e. a £ M) 
m 2 (bt) £ F' iff b£ F 1 OH = M 1 

Since for such an answer set F' of r, M' < p > ' >p 'Z A/I where M' = F' fl H is 
satisfied, it follows that, for M' = F' fl H, 

M' < pl > ~> pk ;Z M 
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iff (M'lQj = M[Q ]) A A T=\(M'\Pi] = M[P*'])A (MJP*] c M\P%) for 3* < k 
iff h G F\ eq\ G F ' , . . . , e^-i G F', c* G F' and dt ^ F' 
iff smaller i G F' for 1 < 3 i < k 
iff subsetp G F' 

Hence F' should be also an answer set of T[77: P 1 , . . . , P fc ; if; 717] which sat- 
isfies M' < p > ' >p > z M for M' = F' fl 77. Therefore T[II ; P 1 , . . . , P fc ; Z\ 717] is 
consistent. □ 

Proof of Theorem 5 

Proof: Let Ma and .45 be the sets of Her brand models of A and 77 respectively. 
(=>) Suppose that F[77; P 1 , . . . , P k \ Z\ 717] is inconsistent for a given answer set 
717 of 77. Then according to Corollary 2, there is not any answer set 717' of II 
such that M' < p > ' >p < z 717. Hence, for any answer set M' G AS such that 
M'JQ] = M[Q], if M' is not tie w.r.t. 717, it follows that, 

717 < pl > -> pk Z M', (5) 

where 7\7|P J ] = 7l7'[P J ] (j = 1, . . . , i — 1) and 
7V7[P*] C 717' [P*] for 3 i < k. 

Since such M' G AS is minimal in A 4 a with respect to < p due to Theorem 1, for 
any answer set M" G A7a\AS such that M' [Q] = 7V7"[<5] and 7V7'[Z] = 717" |Z], 
it follows that, 

M'< P M", where 7V7'[P] C M"\Pj and P = P 1 U • • • U P fc . (6) 

Thus according to (5) and (6), with respect to 717 and such 717" G M a \AS via 
M' G AS such that 717 [Q] = M' [Q] = 717" [Q], the following holds transitively. 

717 < pl > -> pk \ z M", (7) 

where M[Pi] = M'JP j j (j = 1, 1) and 

M{P e \ C M" [P e j for 1 < 3 t<i. 

Thus, due to (5) and (7), for any answer set TV G A \a which is not tie w.r.t. 717 
and N[Qj = M[Q], 

1 VI < pl > -> pk ; z i v (8) 

is derived. On the other hand, for M' G AS(C A 4a) which is tie w.r.t. 717, 

M'jt pl> - >pk ’ Z M (9) 

due to Corollary 2. As a result, due to (8) and (9), there is no model TV G A A a 
such that TV < p > ' >p Z M. Therefore, 717 is minimal in A4a with respect to 
< p > ' >p which means that 717 is a model of Circum(A; P 1 > • • • > P k \ Z). 
(^=) The contrapositive is proved. That is, in the following, we prove that if 
T[T7; P 1 , . . . , P k ; Z\ 717] is consistent for an answer set 717 of 77, then 717 is not a 
model of CircumfA ; P 1 > • • • > P fc ; Z). 

Suppose that F[77; P 1 , . . . , P k \ Z\ M] is consistent for an answer set 717 of 77. 
Then, according to Theorem 4, there exists some answer set 717' of 77 such that 
M' < pl >...>P fe ;Z Since both M and 717' are Herbrand models in A 4 a, 717 is 
not minimal in A4^ with respect to < p > -> p Z due to Definition 3. Thus, 717 
is not a model of Circum{A\ P 1 > • • • > P fc ; if) due to Proposition 2. □ 
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Abstract. We present a new technique for the optimization of (partially) bound 
queries over disjunctive datalog programs. The technique exploits the propaga- 
tion of query bindings, and extends the Magic-Set optimization technique (origi- 
nally defined for non-disjunctive programs) to the disjunctive case, substantially 
improving on previously defined approaches. 

Magic-Set-transformed disjunctive programs frequently contain redundant rules. 
We tackle this problem and propose a method for preventing the generation of 
such superfluous rules during the Magic-Set transformation. In addition, we pro- 
vide an efficient heuristic method for the identification of redundant rules, which 
can be applied in general, even if Magic-Sets are not used. 

We implement all proposed methods in the DLV system - the state-of-the-art 
implementation of disjunctive datalog - and perform some experiments. The ex- 
perimental results confirm the usefulness of Magic-Sets for disjunctive datalog, 
and they highlight the computational gain obtained by our method, which out- 
performs significantly the previously proposed Magic-Set method for disjunctive 
datalog programs. 



1 Introduction 

Disjunctive datalog (Datalog v ) programs are logic programs where disjunction may 
occur in the heads of rules [1,2]. Disjunctive datalog is very expressive in a precise 
mathematical sense: it allows to express every property of finite ordered structures that 
is decidable in the complexity class [2]. Therefore, under widely believed assump- 
tions, Datalog v is strictly more expressive than normal ( disjunction-free ) datalog which 
can express only problems of lower complexity. Importantly, besides enlarging the class 
of applications which can be encoded in the language, disjunction often allows for rep- 
resenting problems of lower complexity in a simpler and more natural fashion (see [3]). 

Recently, disjunctive datalog is employed in several projects, mainly due to the 
availability of some efficient inference engines, such as the DLV system [4] and the 
GnT system [5]. E.g., in [6] this formalism has been shown to be very well-suited for 
database repair, and the European Commission has funded a couple of 1ST projects 
focusing on the exploitation of disjunctive datalog in “hot” application areas like infor- 
mation integration and knowledge management 1 . 

1 The exploitation of disjunctive datalog for information integration is the main focus of the IN- 
FOMIX project (IST-2001-33570); while an application of disjunctive datalog for knowledge 
management is studied in ICONS (IST-2001-32429). 
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The increasing application of disjunctive datalog systems stimulates the research on 
algorithms and optimization techniques, which make these systems more efficient and 
more widely applicable. Within this framework, we investigate here a promising line 
of research consisting of the extension of deductive database techniques and, specifi- 
cally, of binding propagation techniques exploited in the Magic-Set method [7-12], to 
nonmonotonic logic languages like disjunctive datalog. 

Intuitively, the goal of the Magic-Set method (originally defined for non-disjunctive 
datalog queries only) is to use the constants appearing in the query to reduce the size 
of the instantiation by eliminating “a priori” a number of ground instances of the rules 
which cannot contribute to the derivation of the query goal. 

The first extension of Magic-Set method to disjunctive programs is due to [13], 
where the author observes that binding propagation strategies have to be changed for 
disjunctive rules so that each time a head predicate receives some binding from the 
query, it eventually propagates this relevant information to all the other head predicates 
as well as to the body predicates (see Section 3.1). An algorithm implementing the 
above strategy has been also proposed in [13]. Roughly, it is a rewriting algorithm that 
bloats the program with some additional predicates (called collecting predicates), be- 
sides the standard “magic” ones (intrinsic in the Magic-Set method) in order to make the 
propagation strategy work - in the following we call this algorithm Auxiliary Predicates 
Method (APM). 

In this paper we provide fresh and refined ideas (w.r.t. [13]) for extending the Magic- 
Set method to disjunctive datalog queries. In particular, we observe that the method in 
[13] has two major drawbacks. First, the introduction of the new (collecting) predicates 
enlarges the size of the grounding and consequently reduces the gain that could be po- 
tentially achieved by the optimization. Second, several redundant (which are subsumed 
by the rest of the program) rules are frequently generated by the application of this 
method. Since the number of rules in a program is a critical performance factor, these 
redundancies can deteriorate run-time behavior. In extreme cases this overhead alone 
can outweigh the benefits of the optimization - since the evaluation of a disjunctive 
datalog program requires exponential time in the size of its instantiation, a polynomial 
increase in the size of the program instantiation may give an exponential increase in the 
program evaluation time. 

Here, we address both problems above. Specifically, the main contribution is the 
following: 

> We define a new Magic-Set method for disjunctive datalog. The new method, 
called Disjunctive Magic-Set (DMS), overcomes some drawbacks of the previous 
magic-set methods for disjunctive datalog. We provide an algorithm for the proposed 
DMS method, which involves a generalization of sideways information passing to the 
disjunctive case. Importantly, we formally prove the correctness of the DMS method by 
showing that given a query Q over a program P, the brave and cautious answers of Q 
over P coincide, respectively, with the brave and cautious answers of Q over P', where 
P' is the rewriting of P under DMS. 

> We design effective techniques for avoiding redundant rules. The head-to-head 
binding propagation needed for disjunctive programs (see [13] and Section 3.1), very 
often causes the generation of many redundant rules (both APM and DMS are affected 
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by this problem). We experimentally observe that the presence of redundant rules slows 
down the computation significantly, and may even counterpoise the advantages of the 
magic sets optimization. Thus, we design two techniques for redundant-rules prevention 
and elimination, respectively. The former technique prevents some cases of generation 
of redundant rules, by storing some extra information on the binding-propagation flow. 
Since the problem of redundant-rule identification is untractable (like clause subsump- 
tion), to eliminate “a posteriori” redundant rules (which could not be avoided by the 
former technique), we design a new and efficient heuristic for identifying redundant 
rules. Note that this heuristic is not specific for the disjunctive Magic-Set method, and 
can be applied for any type of logic program, even in the presence of unstratified nega- 
tion and constraints. The enhancement of DMS with both our redundancy prevention 
and elimination techniques, yields an improved method, called Optimized Disjunctive 
Magic-Set Method (ODMS). 

> We implement all the proposed methods and techniques. In particular, we im- 
plement the DMS method and its enhancements for redundancy prevention and elimi- 
nation (yielding ODMS), in the DLV system [4] - the state-of-the-art implementation 
of disjunctive datalog. Both DMS and ODMS are fully integrated in the DLV sys- 
tem, and their are completely transparent to the end user that can simply enable them 
by setting the corresponding option. The interested reader can retrieve from http: 
//www. dlvsystem. com/magic/ a downloadable executable of the DLV system 
in which an option for using DMS or ODMS is provided - the same url contains some 
hints for its usage. 

> We evaluate the efficiency of the implemented method: We have performed ex- 
tensive experiments using benchmarks reported in the literature, comparing the perfor- 
mance of the DLV system without optimization, with APM of [ 13], with DMS, and with 
ODMS. These experiments show that our methods, especially ODMS, yields speedups 
in many cases and only rarely produces mild overheads w.r.t. the native DLV system, 
greatly improving on APM of [13]. 



2 Preliminaries 

2.1 Disjunctive Datalog Queries 

A disjunctive rule r is of the form ci\ v • • • v a n b\, ■ ■ ■ , bk-, where aq , ■ ■ ■ ,a n , 
bi, • ■ ■ , bk are atoms and n > 1, k > 0. The disjunction oi v • • • v a„ is the head 
of r, while the conjunction &i, ...,&& is the body of r. Moreover, let H(r ) = {aq, . . ., 
a n } and B(r) = {foi ,. . . , bk}- A non-disjunctive rule with an empty body (i.e. n = 1 
and k = 0) is called a fact. If a predicate is defined only by facts, it is referred to 
as EDB predicate, otherwise as IDB predicate. Throughout this paper, we assume that 
rules are safe , that is, each variable of a rule r appears in a positive literal of the body 
of r. A disjunctive datalog program (short. Datalog v program) V is a finite set of rules; 
if V is disjunction-free, then it is a datalog program (Datalog program). A query Q is a 
non-empty conjunction 6i, < • • , bk of atoms. 

Given a program V, we denote by groundlfP) the set of all the rules obtained by 
applying to each rule r £ V all possible substitutions from the variables in r to the set 
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of all the constants in V. The semantics of a program V is given by the set AAAA(V) 
of the subset-minimal models of V. Note that on Datalog v the notion of answer set [1] 
coincides to the notion of minimal model. 

Let V be a Datalog v program and let T be a set of facts. Then, we denote by TV 
the program TV = V U T. Given a query Q and an interpretation M of V, t?(Q, M) 
denotes the set containing each substitution <fi for the variables in Q such that <j>(Q) is 
true in M. The answer to a query Q over TV, under the brave semantics, denoted by 
Ansb{Q, TV)> is the set UaV(Qi M), such that M G MA4(V U T). The answer to a 
query Q over the facts in T, under the cautious semantics, denoted by Ans c (Q, TV)> 
is the set n M t?(Q, M), such that M G MM{V U T) ± 0. If MM{V U T) = 0, 
then all substitutions over the universe for variables in Q are in the cautious answer. 
Finally, we say that programs V and V' are bravely (resp. cautiously) equivalent w.r.t. 
Q, denoted by V =q^ V (resp. V =q, c V), if for any set T of facts AnSb{Q , TV) = 
Ans b (Q,'P'yr) (resp. Ans c (Q,V » = Ans c (Q,'P l - F )). 

2.2 Magic-Set for Non-disjunctive Datalog Queries 

We will illustrate how the Magic-Set method simulates the top-down evaluation of a 
query by considering the program consisting of the rules path(X, Y) edge(X, Y). and 
path(X,Y) :-edge(X, Z), path(Z, Y). together with query path(l, 5)?. 

Adornment Step: The key idea is to materialize, by suitable adornments, binding infor- 
mation for IDB predicates which would be propagated during a top-down computation. 
These are strings of the letters b and /, denoting bound or free for each argument of an 
IDB predicate. First, adornments are created for query predicates. The adorned version 
of the query above is path bb (l, 5). 

The query adornments are then used to propagate their information into the body of 
the rules debiting it, simulating a top-down evaluation. Obviously various strategies can 
be pursued concerning the order of processing the body atoms and the propagation of 
bindings. These are referred to as Sideways Information Passing Strategies (SIPS), cf. 
[9], Any SIPS must guarantee an iterative processing of all body atoms in r. Let q be 
an atom that has not yet been processed, and v be the set of already considered atoms, 
then a SIPS specibes a propagation v — q, where \ is the set of the variables bound 
by v, passing their values to q. 

In the brst rule of the example (path(X, Y) edge(X, Y).) a binding is only passed 
to the EDB predicate edge (which will not be adorned), yielding the adorned rule 
path bb (X, Y) edge(X, Y). In the second rule, path bb (X, Y) passes its binding infor- 
mation to edge(X,Z) by path bb (X,Y) — ^{x} edge(X, Z). edge(X, Z) itself is not 
adorned, but it gives a binding to Z. Then, we consider path(Z, Y), for which we obtain 
the propagation path bb (X, Y), edge(X, Z) — >{y.z} P a 'th(Z, Y). This causes the genera- 
tion of the adorned atom path bb (Z, Y), and the resulting adorned rule is path bb (X, Y) : - 
edge(X, Z), path bb (Z,Y). 

In general, adorning a rule may generate new adorned predicates. This step is re- 
peated until all adorned predicates have been processed, yielding the adorned program, 
in our example it consists of the rules path bb (X, Y) edge(X, Y). and path bb (X, Y) 

edge(X, Z), path bb (Z,Y). 
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Generation Step: The adorned program is used to generate magic rules, which simulate 
the top-down evaluation scheme. Let the magic version magic(p°) for an adorned atom 
p“ be defined as magic_p“ in which all arguments labeled / in a are eliminated. 

Then, for each adorned atom p in the body of an adorned rule r a , a magic rule 
r m is generated such that (i) the head of r m consists of magic{ p), and (ii) the body 
of r m consists of the magic version of the head atom of r a , followed by all of the 
predicates of r a which can propagate the binding on p. In our example we generate 
magic_path bb (Z, Y) magic_path bb (X, Y), edge(X, Z). 

Modification Step: The adorned rules are modified by including magic atoms gen- 
erated in Step 2 in the rule bodies. The resultant rules are called modified rules. For 
each adorned rule whose head is h. we extend the rule body by inserting magicOi). In 
our example. path bb (X,Y) magic_path bb (X, Y), edge(X,Y). and path bb (X,Y) 
magic_path bb (X, Y), edge(X,Z), path bb (Z, Y). are generated. 

Processing of the Query: For each adorned atom g“ of the query, (1) the magic seed 
magic (g'' ). is asserted, and (2) a rule g g“ is produced. In our example we generate 
magic_path bb (l, 5). and path(X, Y) path bb (X, Y). 

The complete rewritten program consists of the magic, modified, and query rules. 
Given a non-disjunctive datalog program V, a query Q, and the rewritten program V' , 
it is well known (see e.g. [7]) that V and V' are equivalent w.r.t. Q, i.e., V =Q t b V 
and V =g c V' hold (since brave and cautious semantics coincide for non-disjunctive 
datalog programs). 

3 Magic-Set Method for Disjunctive Datalog Programs 

In this section we present the Disjunctive Magic-Set algorithm (short. DMS) for the op- 
timization of disjunctive datalog programs, which has been implemented and integrated 
into the DLV system [4], Before discussing the details of the algorithm, we informally 
present the main ideas that have been exploited for enabling the Magic-Set method to 
work on disjunctive programs. 

3.1 Binding Propagation in Datalog v Programs: Some Key Issues 

As first observed in [13], while in nondisjunctive programs bindings are propagated 
only head-to-body, any sound rewriting for disjunctive programs has to propagate bind- 
ings also head-to-head in order to preserve soundness. Roughly, suppose that a pred- 
icate p is relevant for the query, and a disjunctive rule r contains p(X) in the head. 
Then, besides propagating the binding from p(X) to the body of r (as in the nondis- 
junctive case), a sound rewriting has to propagate the binding also from p(X) to the 
other head atoms of r. Consider, for instance, a Datalog v program V containing rule 
p(X) v q(Y) a(X,Y), r(X). and the query p(l)?. Even though the query propagates 
the binding for the predicate p, in order to correctly answer the query, we also need 
to evaluate the truth value of q(Y), which indirectly receives the binding through the 
body predicate a(X, Y). For instance, suppose that the program contains facts a(l, 2), 
and r(l); then atom q(2) is relevant for query p( 1)? (i.e., it should belong to the magic 
set of the query), since the truth of q(2) would invalidate the derivation of p( 1) from 




376 



Chiara Cumbo et al. 



the above rule, because of the minimality of the semantics. It follows that, while prop- 
agating the binding, the head atoms of disjunctive rules must be all adorned as well. 

However, the adornment of the head of one disjunctive rule r may give rise to mul- 
tiple rules, having different adornments for the head predicates. This process can be 
somehow seen as “splitting” r in multiple rules. While this is not a problem in the 
nondisjunctive case, the semantics of a disjunctive program may be affected. Con- 
sider, for instance, the program p(X, Y) v q(Y, X) :-a(X,Y). in which p and q are 
mutually exclusive (due to minimality), since they do not appear in any other rule 
head. Assuming the adornments p bf and q bf to be propagated, we might obtain rules 
whose heads have the form p bf (X, Y) v q fb (Y, X) (derived while propagating p bf ) and 
p fb (X, Y) v q bf (Y,X) (derived while propagating q bf ). These rules could support two 
atoms p bf (m, n) and q M (n, m), while in the original program p(m, n) and p(n, m) could 
not hold simultaneously (due to semantic minimality), thus changing the original se- 
mantics. 

The method proposed in [13] circumvents this problem by using some auxiliary 
predicates which collect all facts coming from the different adornments. For instance, in 
the above example, two rules of the form collect_p(X, Y) p fb (X, Y). and collect. 
p(X, Y) p bf (X, Y). are added for predicate p. The main drawback of this approach is 
that collecting predicates, while resolving the semantic problem, bloat the program with 
additional rules reducing the gain of the optimization. 

A relevant advantage of our algorithm (confirmed also by an experimental analysis) 
is that we do not use collecting predicates; rather, we preserve the correct semantics 
by stripping off the adornments from non-magic predicates in modified rules. Other 
computational advantages come from our adornment technique, which is obtained by 
extending non-disjunctive SIPS to the disjunctive case. 



3.2 DMS Algorithm 

The salient feature of our algorithm is that we generate modified and magic rules on 
a rule-by-rule basis. To this end, we exploit a stack S of predicates for storing all the 
adorned predicates to be used for propagating the binding of the query: At each step, an 
element is removed from S, and each defining rule is processed at a time. Thus, adorned 
rules do not have to be stored. 

The algorithm DMS (see Figure 1) implements the Magic-Set method for disjunc- 
tive programs. Its input is a disjunctive datalog program V and a query Q. Note that the 
algorithm can be used for non-disjunctive rules as a special case. If the query contains 
some non-free IDB predicates, it outputs a (optimized) program DMS( Q, V) consisting 
of a set of modified and magic rules, stored by means of the sets modifiedRules(Q , V) 
and magicRules(Q, V), respectively. The main steps of the algorithm DMS are illus- 
trated by means of the following running example. 

Example 1 (Strategic Companies [14]). We are given a collection C of companies pro- 
ducing some goods in a set G, such that each company c* £ C is controlled by a set of 
other companies O l C C. A subset of the companies €' C C is a strategic set set if it 
is a minimal set of companies producing all the goods in G, such that if 0. t C O' for 
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Input: A Datalog v program V, and a query Q = gi(ti), . . . , g n (t n ). 

Output: The optimized program DMS (Q,V). 

var S: stack of adorned predicates; modifiedRules{Q , V),magicRules(Q, V): set of rules; 

begin 

1. if gi(ti), . . . , g n (t n ) has some IDB predicate then 

2. modifiedRules(Q , 'P):=0; (S, magicRules(Q , V)) \=BuildQuerySeeds{Q)\ 

3. while S ^ 0 do 

4. p a :=S. pop(); 

5. for each rule r 6 V: p(t) vpi (ti) v . . . vp n (t n ) qi(si), . . . , q m (s m ) do 

6. r a '=Adorn(r,p a ,S)\ 

7. magicRules(Q,'P) := magicRules(Q,,'P) (J Generateir a )\ 

8. modifiedRules{Q,'P) := modifiedRules(Q,'P) [J {Modify(r a )}; 

9. end for 

10. end while 

11. DMS (Q,V):=magicRules(Q,V) U modifiedRules(Q,V)\ 

12. return DMS(Q, V)\ 

13. end if 
end. 



Fig. 1. Disjunctive Magic-Set Method 



some i = 1 ,... ,m then c, £ C must hold. This scenario can be modelled by means of 
the following program V sc . 

r*i : sc(Ci) v sc(C 2 ) produced. by(P, Ci, C 2 ). 

r 2 : sc(C) controlled. by(C, Ci, C 2 , C 3 ), sc(Ci), sc(C 2 ), sc(C 3 ). 

Moreover, given a company c £ C, we consider a query Q sc = sc(c) asking whether 
c belongs to some strategic set of C. □ 

The computation starts in step 2 by initializing modiftedRules(Q , V) to the empty 
set. Then, the function BuildQuerySeeds is used for storing in magicRides (Q, V) the 
magic seeds, and pushing on the stack S the adorned predicates of Q. Note that we 
do not generate any query rules, because the transformed program will not contain 
adornments. 

Example 2. Given the query Q sc = sc(c) and the program V sc . BuildQuerySeeds 
creates magic_sc b (c). and pushes sc b onto the stack S. □ 

The core of the technique (steps 4-9 ) is repeated until the stack S is empty, i.e., 
until there is no further adorned predicate to be propagated. Specifically, an adorned 
predicate p Q is removed from the stack S in step 4, and its binding is propagated in 
each (disjunctive) rule r in V of the form 

r : p(t) v pif/G) v ... v p n (t n ) q^s^, . . . , q m (s m ). 
with n > 0, having an atom p(t) in the head (step 5). 

Adorn. Step 6 performs the adornment of the rule. Different from the case of non- 
disjunctive programs, the binding of the predicate p Q needs to be also propagated to 
the atoms pi(ti), . . . , p n (t n ) in the head. We achieve this by defining an extension of 
any non-disjunctive SIPS to the disjunctive case. The constraint for such a disjunctive 
SIPS is that head atoms (different from p(t)) cannot provide variable bindings, they can 
only receive bindings (similarly to negative literals in standard SIPS). So they should 
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be processed only once all their variables are bound or do not occur in yet unprocessed 
body atoms 2 . Moreover they cannot make any of their free-variables bound. 

The function Adorn produces an adorned disjunctive rule from an adorned predicate 
and a suitable unadorned rule by employing the refined SIPS, pushing all newly adorned 
predicates onto S. Hence, in step 6 the rule r a is of the form 

r a : p“(t) v p“‘(t!) ••• P^On) :-qf 1 (s 1 ),---,qf"(s m ). 

Example 3. Consider again Example 1. When sc b is removed from the stack, we first 
select rule r\ and the head predicate sc(Ci). Then, the adorned version is 

r' la : sc b (Ci) v sc b (C 2 ) produced_by(P, Ci, C 2 ). 

Next ri is processed again, this time with head predicate sc(C 2 ), producing 

r" a : sc b (C 2 ) v sc b (Ci) produced_by(P, Ci, C 2 ). 

and finally, processing r 2 we obtain 

r 2a : sc b (C) controlled. by(C, Ch, C 2 , C 3 ), sc b (C 1 ), sc b (C 2 ), sc b (C 3 ). 



□ 

Generate. The algorithm uses the adorned rule r a for generating and collecting the 
magic rules in step 7. Since r a is a disjunctive rule, Generate first produces a non- 
disjunctive intermediate rule by moving head atoms into the body. Then, the standard 
technique for Datalog rules, as described in Generation Step in Section 2, is applied. 

Example 4. In the program of Example 3, from the rule first its non-disjunctive 
intermediate rule 

sc b (Ci) sc b (C 2 ), produced_by(P, Ci, C 2 ). 

is produced, from which the magic rule 

magic_sc b (C 2 ) magic_sc b (Ci), produced. by(P, Ci, C 2 ). 

is generated. Similarly, from the rule r" a we obtain 

magic_sc b (Ci) magic_sc b (C 2 ), produced. by(P, Ci, C 2 ). 

and finally r 2a gives rise to the following rules 

magic_sc b (Ci) magic_sc b (C), controlled. by(C, Ci, C 2 , C 3 ). 
magic_sc b (C 2 ) magic_sc b (C), controlled. by(C, Ci, C 2 , C 3 ). 
magic_sc b (C 3 ) magic_sc b (C), controlled. by(C, Ci, C 2 , C 3 ). 

□ 

2 Recall that the safety constraint guarantees that each variable of a head atom also appears in 
some positive body-atom. 
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Modify. In step 8 the modified rules are generated and collected. The only difference to 
the non-disjunctive case is that the adornments are stripped off the original atoms - see 
Section 3.1. Hence, the function Modify constructs a rule of the following form 

P(t) vpi(ti) v . . . vp n (t n ) magic( p“(t)), magic( p" 1 (ti)), • • • , magic( p““(t n )), 

Tl ( s l) > • • • > Tm( s m) • 



Finally, after all the adorned predicates have been processed the algorithm outputs the 
program DMS(Q, V). 

Example 5. In our running example, we derive the following set of modified rules: 

r' lm : sc(Ci) v sc(C 2 ) magic. sc b (Ci), magic. sc b (C 2 ), produced. by(P, Ci, C 2 ). 

r" : sc (C 2 ) v sc(Ci) magic. sc b (C 2 ), magic. sc b (Ci), produced. by(P, Ci, C 2 ). 

T 2 m : sc(C) magic. sc b (C), controlled. by(C, Ci, C 2 , C 3 ), sc(Ci), sc(C 2 ), sc(C 3 ). 

where (resp. r" , f 2 m ) is derived by adding magic predicates and stripping off 
adornments for the rule r[ a (resp. r” a , r 2 a ). Thus, the optimized program DMS( Q sc , V cs ) 
comprises the above modified rules as well as the magic rules in Example 4, and the 
magic seed magic. sc b (c). □ 

3.3 Query Equivalence Results 

We conclude the presentation of the DMS algorithm by formally proving its soundness. 
To this aim proofs in [13] cannot be reused, due to the many differences w.r.t. our 
approach. The result is shown by first establishing a relationship between the minimal 
models of the program DMS ( <2, V) and of the program rel(Q, V) constructed as follows. 

Given a set S of ground rules of V , we denote by R(<S) the set {r £ groundiV) \ 
3 r' £ S, 3q £ B(r')UH(r') s.t. q £ H(r)}. Then, rel(Q , V) is the least fixed point of 
the following succession relo(Q, V) = {r £ ground(V) | 3 ground(q) £ QnH(r)}, 
and reli + i(Q, V) = H(rek(Q, V)), for each i > 0. 

Notice that the correspondence between the models of DMS(Q, V) and of rel(Q , V) 
can be established by focusing on non-magic atoms only. Thus, we next exploit the 
following notation. Given a model M and a predicate symbol g, we denote by M[g\ the 
set of atoms in M whose predicate symbol is g. Then, M[V] denotes the set of atoms 
in M whose predicate symbol appears in the head of some rule of V . Finally, given a 
set of interpretations S, let S[g] = {M[g]\M £ 5 1 } and S[V] = {M[P]\M £ 5}. 

Lemma 1. Given a Datalog v program V, and a query Q. Then, it holds that VM 7 £ 
MM(ms(Q,V)), and 3M £ MM(rel(Q,V)) s.t. M = M'[rel(Q,V)}. 

Lemma 2. Given a Datalog v program V. and a query Q. Then, it holds that VM £ 
MM(rel(Q,V)), and3M' £ A4A4(DMS(Q, V)) s.t. M = M'[rel(Q,V)}. 

Armed with the above results, we can prove the following. 

Theorem 1 (Soundness of the DMS Algorithm). Let V be a Datalog v program, let 
Q be a query. Then. DMS((Q, V)) =Q,b V and DMS((Q, P)) =q, c V hold. 
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Proof (Sketch). Let rel(Q,V) denote the set ground(V) — rel(Q,V). After lemmas 
1 and 2, it suffices to prove that rel(Q,V) =Q,b V and rel(Q,V) =q, c V. In fact, 
we can show that ground( V) is partitioned into two modules (see definitions and nota- 
tions in [2]), i.e., rel(Q,V) > rel(Q,V), that can be hierarchically evaluated. Thus, 
the models of V are such that A4A4(V) = [J M MA4(M U rel(Q,V)), for each 
M £ M.M.(rel(Q,V)), where for the sake of simplicity, the model M is also used 
for denoting the set of the corresponding ground facts in it. 

The results follows by observing that for each predicate q in Q, MM(V)[q\ = 
(MM(V)[rel(Q,V)])[q\. In fact, we can show that MM(V)[rel(Q,V)] = 
MA 4 (rel(Q, V)). Then, it suffices to observe that for each predicate q in Q, the set 
of ground rules having q in the head is in relo(Q, V) C rel(Q,V). □ 

4 Redundant Rules: Prevention and Checking 

Both the DMS method described above and APM of [13] have a common drawback: 
Numerous redundant rules may be generated, which can deteriorate the optimization. 
For instance, in Example 5 the first two modified rules coincide, and this might happen 
even if the two head predicates differ. We stress that our rewriting algorithm already 
drastically reduces the impact of such phenomena, as it does not introduce additional 
predicates and rules (apart from magic rules). Nevertheless, since this aspect is crucial 
for the optimization process, we next devise some strategies for further reducing the 
overhead. 

Let V be a disjunctive datalog program, and let n and r 2 be two rules of V . Then, 
r\ is subsumed by r2 (denoted by r\ C rf) if there exists a substitution d for H(r2) U 
B(r2), such that d(H(rf)) Q H(r\) and d(B(r2)) C B(r\). Finally, a rule n is 
redundant if there exists a rule r2 such that r\ C r^. Unfortunately, deciding whether a 
rule is subsumed by another rule is a hard task: 

Theorem 2 . Let V be a disjunctive datalog program, and let r\ and r2 be two rules 
ofV. Then, the problem of deciding whether r\ C T2 is NP-complete in the number of 
variables of r\ ( program complexity). Hardness holds even for B(rf) = B(r 2) = 0 . 

The above result strongly motivates the design of methods for preventing the gener- 
ation of redundant rules as well as of polynomial time heuristics for their identification. 
The latter aspect is also of interest outside the context of the Magic-Set method. 

4.1 Prevention of Redundant Rules 

There are two typical situations in which redundant rules may be generated: (SI) when 
adorning a disjunctive rule with two predicates having the same adornment and argu- 
ments, and (S 2 ) when adorning a rule with an adorned predicate, which stems solely 
from a previous adornment of the same rule. 

Example 6 . (SI) Assume that the adorned predicates p b and s b are used for propagat- 
ing the binding in the rule p(X) v s(X) a(X). Then, both of the modified rules will 
eventually result in s(X) v p(X) magic_s b (X), magic_p b (X), a(X). □ 

The source of the redundancy lies in the fact that disjunctive rules may be adorned 
by two distinct predicates (s b and p b in the example) sharing the same bound variables. 
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Example 7. (S2) Consider the rule s(X, Z) v p(X,Y) :-a(X), b(Y), c(Z). and the query 
s(l, 2)?. By adorning with s bb we obtain the modified rule 

ri : s(X, Z) v p(X,Y) magic_s bb (X, Z), magic_p M (X), a(X), b(Y), c(Z). 

and p bf is pushed onto the stack, which gives rise to 

r 2 : s(X, Z) v p(X,Y) magic. s bf (X), magic_p bf (X), a(X), b(Y), c(Z). 

which is not syntactically subsumed by nor subsumes ri . 

Nonetheless, if p bf is generated only by the above rule, then r 2 will add no sig- 
nificant information as for the relevance of s bf , as it would propagate to s bf the same 
binding it had received from predicate s itself. Conversely, if the predicate p bf is even- 
tually generated by some other rules, then it must also be considered for adorning the 
above rule, since it may provide additional new information. □ 

Situation SI is easy to implement: In the function Generate we add a check whether 
the creation of the modified rule is necessary. Let r : pi(ti) v ... vp n (t n ) ;-q 1 (si), . . . , 
q m (s m ). be a disjunctive rule, and p“ be an adorned predicate that has already been used 
for generating the modified rule r m . Then, any other adorned predicate p“ such that (i) 
Pj has the same arguments of pi and (ii) each argument of pi has the same adornment 
in a and a' , will generate for r a modified rule r' m with r' m C r m . 

This check can be implemented by storing for each adorned predicate the set of 
rules it has already adorned, and it can be proven to be sound and complete. 

Situation S2 requires more effort. It implies that an adorned predicate should not 
always be applied to the whole program. To achieve this, we associate a target to each 
adorned predicate. The first time a predicate p a is pushed on the stack, it is marked 
for being used for adorning all the rules but the one that has generated it; this target is 
termed allButSource. Then, if at a certain point p“ is generated again, then two situa- 
tions may occur: 

- if p“ has been marked allButSource and already used for adorning the program 
(hence has been removed from the stack), then the new predicate will be inserted in 
the stack by marking it for adorning only the rule which was the source of the first 
generation of p' r (that has not been adorned yet); such a target is called onlySource. 

- if p Q has been not yet used, then it is simply marked for being used for adorning all 
the program, giving rise to target all. 

In the implementation we associate to each adorned predicate also the rule that gen- 
erated it. Then, we modify step 6 in Fig. 1 as follows. A rule r considered for being 
adorned with a predicate p“ is actually adorned if and only if (i) the target of p“ is 
onlySource and r has generated the adornment p“, or (ii) the target of p“ is allBut- 
Source and r has not generated the adornment p“, or (iii) the target of p Q is all. 

Due to space limits, we omit the correctness proofs of the above solutions. 

4.2 Identifying Redundant Rules 

Even though the above strategies may significantly reduce the redundancy within the 
rewritten program, we also exploit a (post-processing) technique for identifying those 
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redundant rules whose generation could not be prevented. Specifically, we implemented 
a heuristic for rule subsumption checking that has been integrated into the core of the 
DLV system, and that can be invoked to identify redundancy in any type of program. 
The heuristic is based on the following observation. 

Proposition 1. Letr\ and r 2 be two disjunctive rules. Then, r \ subsumes r 2 if and only 
if there exists an ordering of all the atoms in H(ri)\jB(r\) of the form h, . . . , l m and a 
sequence of substitutions i?i, . . . , such that for each li £ B(r\) (resp. U £ 
there exists l[ € B(r 2 ) (resp. l\ £ H(r 2 )) with (di U . . . U t?j_i )(lf) = {/'}. 

Roughly, we try to construct the sequence li, ... ,l m of the above proposition and 
the associated substitutions in an incremental way. At each step i, we 

choose an atom l, in rq which has not yet been processed such that there exists a can- 
didate l\ in ?’2 for being subsumed. Moreover, if many atoms in r* satisfy the above 
condition, we greedily select the one which subsumes the maximum number of atoms, 
and among these we prefer those with the maximum number of distinct variables not 
yet matched. 

5 Experimental Results 

5.1 Compared Methods, Benchmark Problems and Data 

In order to evaluate the impact of the proposed methods, we compare DMS and ODMS 
both with the traditional DLV evaluation without Magic-Sets and with the APM method 
proposed in [13]. For the comparison, we consider the following benchmark problems 
that have been already used to assess APM in [13] (see therein for more details): 

- Simple Path: Given a directed graph G and two nodes a and b, does there exist a 
unique path connecting a to b in G? The graph is the same as the one reported in 
[13], and the instances are generated by varying the number of nodes. 

- Ancestor: Given a genealogy graph storing information of relationship (father/ 
brother) among people and given two persons p\ and p 2 , is p\ an ancestor of pfl 
The structure of the “genealogy” graph is the same as the one presented in [13], 
and the instances are generated by varying the number of nodes, i.e., the number of 
persons, in the graph. 

- Strategic Companies: The problem has been formalized in Example 1. The in- 
stances are generated according to the ideas presented in [13], by grouping the 
companies in suitable clusters. Let G be the cluster such that c is in G. Then, the 
instances are generated with |G| = 250, while the number of companies outside G 
is varied. 



5.2 Results and Discussion 

The experiments have been performed on Pentium III machines running GNU/Linux. 
The DLV prototype used was compiled with GCC 2.95. For every instance, we have 
allowed a maximum running time of 1 800 seconds (half an hour) and a maximum mem- 
ory usage of 256MB. On all problems, DMS outperforms APM, even without consid- 
ering the time for the rewriting needed in [13], which is also not reported in the figures. 
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Fig. 2. Simple Path: Execution time (Left) and Number of rules instances (Right) 



100 facts 




Fig. 3. Timing in Strategic Companies (Left) and Impact of subsumption checking (Right) 



The results for Simple Path are reported in Figure 2. The diagram on the left shows 
that DMS scales much better than APM on this problem and that ODMS provides addi- 
tional speed-up. The main reason can be understood by looking at the right diagram, in 
which the numbers of ground rules are reported: The overhead of the auxiliary rules for 
APM is evident, it generates about 25 times more rules than DMS. We did not include 
pure DLV (No Magic) in the diagrams, as it is dramatically slower; e.g. the instance 
with 255 nodes takes about 195 seconds. Finally, the experimental results for Ancestor 
are very similar to the ones for Simple Path. 

On the left of Figure 3 we report the results for Strategic Companies. The advan- 
tages of the Magic-Set method (in both implementations) are evident. Anyhow, we can 
see that APM performs and scales worse than DMS, while ODMS provides even better 
performance and scaling. 

Finally, on the right of Figure 3, we report a more detailed analysis on the im- 
pact of subsumption checking. In particular, we want to check whether the applica- 
tion of subsumption checking is computationally heavy (how much performance gets 
worse in a bad case where no redundant rule is identified). To this end, we test a 
program with two types of rules, specifically r* : pi(X)vqi(X) :-a(X), b(X). and 
r' : pi(X) v qi(X) a(X)., where a and b are EDB predicates, and where each rule of 
the form r, is subsumed by a rule of the form r'. 
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We fix a database of 100 facts, and we report the gain, calculated as the difference 
between the execution times of DLV without and with subsumption checking by vary- 
ing the number of the rules of type r, . 

We report three distinct runs: (0) when no redundancy is added, i.e., when there is no 
occurrence of rules of type r[, (1: 1) when one redundant rule is added in correspondence 
to one non-redundant one, i.e., when a rule of the form r' occurs for each rule r, , 
and (1:2) when for each pair of rules and rt + 1 we insert only one occurrence of a 
redundant rule, namely either r\ or <+i- Importantly, the experiments show that the 
implementation is lightweight as in case (0) it does not deteriorate the performance of 
DLV. Moreover, it is effective as it leads to a gain up to 3% for case (1:2) and up to 
16% for case (1:1). 

6 Related Work and Conclusions 

The Magic-Set method is among the most well-known techniques for the optimiza- 
tion of positive recursive Datalog programs due to its efficiency and its generality, even 
though other focused methods, e.g. the supplementary magic set and other special tech- 
niques for linear and chain queries have been proposed as well (see, e.g., [15, 7, 16]). 

After seminal papers L8,9], the viability of the approach was demonstrated e.g. 
in [17, 18]. Later on, extensions and refinements have been proposed, addressing e.g. 
query constraints in [10], the well-founded semantics in [11], or integration into cost- 
based query optimization in [12]. The research on variations of the Magic-Set method is 
still going on. For instance, in [ 19] a technique for the class of soft-stratifiable programs 
is given, and in [13] an elaborated technique for disjunctive programs is described. 

In this paper, we have elaborated on the issues addressed in [13]. Our approach is 
similar in spirit to APM, but differs in several respects: 

- DMS avoids the use of auxiliary predicates needed for APM, yielding a significant 
computational benefit. 

- DMS is a flexible framework for enhancements and optimizations, as it proceeds in 
a localized fashion by analyzing one rule at time, while APM processes the whole 
program at time. 

- ODMS extends DMS by employing effective methods for avoiding the generation 
of and for identifying still left-over redundant rules. 

- ODMS has been integrated into the DLV system [4], profitably exploiting the DLV 
internal datastructures and the ability of controlling the grounding module. 

- We could experimentally show that our ODMS implementation outperforms APM 
on benchmarks taken from the literature. 

It has been noted (e.g. in [11]) that in the non-disjunctive case, memoing techniques 
lead to similar computations as evaluations after Magic-Set transformations. Also in 
the disjunctive case such techniques have been proposed, e.g. Hyper Tableaux [20], 
for which a similar relationship might hold. However, we leave this issue for future 
research, and follow [ 1 1] in noting that an advantage of Magic-Sets over such methods 
is that they can be more easily combined with other database optimization techniques. 

Concerning future work, our objective is to extend the Magic-Set method to the case 
of disjunctive programs with constraints and unstratified negation, such that it can be 
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fruitfully applied on arbitrary DLV programs. We believe that the framework developed 
in this paper is general enough to be extended to these more involved cases. 
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Abstract. We introduce a new method for Rectilinear Steiner Tree 
(RST) construction in a graph, using answer set programming. This 
method provides a formal representation of the problem as a logic pro- 
gram whose answer sets correspond to solutions. The answer sets for 
a logic program can be computed by special systems called answer set 
solvers. We describe the method for RST construction in the context 
of VLSI routing where multiple pins in a given placement of a chip are 
connected by an RST. Our method is different from the existing meth- 
ods mainly in three ways. First, it always correctly determines whether a 
given RST routing problem is solvable, and it always produces a solution 
if one exists. Second, some enhancements of the basic problem, in which 
lengths of wires connecting the source pin to sink pins are restricted, can 
be easily represented by adding some rules. Our method guarantees to 
find a tree if one exists, even when the total wire length is not minimum. 
Third, routing problems with the presence of obstacles can be solved. 
With this approach, we have computed solutions to some RST routing 
problems using the answer set solver CMODELS. 



1 Introduction 

The Steiner tree problem is a combinatorial search problem that asks for a 
connected graph spanning a given set of points such that the total “length” of 
edges is minimum. In this paper, we consider a variation of Steiner trees whose 
edges are composed of horizontal or vertical line segments. Such a Steiner tree 
is called a Rectilinear Steiner Tree (RST) [1], Here the length of an edge is 
the number of segments contained in that edge. This problem is NP-complete 
[2]. The computational problem we are interested in is to construct an RST 
connecting a set of given vertices in an undirected graph in the presence of 
obstacles. Consider, for instance, the tree shown in Figure 1 that connects all 
points labeled pO, . . . ,p30 without passing through the obstacles, shown in black. 
Since the total number of segments covered by this tree is minimum, this tree is 
an RST. 



B. Demoen and V. Lifschitz (Eds.): ICLP 2004, LNCS 3132, pp. 386—399, 2004. 
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1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Fig. 1 . An RST routing problem along with its solution. 



We introduce a new method for RST construction using answer set program- 
ming (ASP) [3-5] - a declarative programming methodology that can be used 
to solve combinatorial search problems. The idea of ASP is to represent a given 
computational problem as a logic program whose answer sets (stable models) [6, 
7] correspond to solutions to the given problem. Systems that compute answer 
sets for a logic program are called answer set solvers. For instance, SMODELS with 
its front-end LPARSE 1 is one of the answer set solvers that are currently available. 
System CMODELS 2 is another answer set solver, and it also uses LPARSE as its 
front-end. In the main part of this paper, we assume that the reader has some 
familiarity with the language of lparse [4]. 

We describe the RST construction problem as a logic program in the language 
of LPARSE, and use the answer set solver CMODELS to compute solutions. For 
instance, the solution presented in Figure 1 is computed using CMODELS. 

In this paper, we consider RST construction in the context of VLSI routing. 
Automatic routing of wires has been the premier application of RSTs since 
Hanan’s original paper [8]. (See [9] for the work on RST routing applications). 
The problem shown in Figure 1 can be viewed as a routing problem where 
multiple “pins” , i.e. , pO, . . . ,p30, are connected via an RST. This problem differs 
from other RST routing applications studied earlier in that there are obstacles 



1 http : //www. tcs .hut . f i/Software/ smodels/ 

2 http : //www. cs .utexas . edu/users/tag/ cmodels/ 
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on the grid. The obstacles are the parts of the chip occupied by previously routed 
nets (i.e. , trees), or existing devices (e.g., memory, registers). 

Our method always correctly determines whether a given RST routing prob- 
lem is solvable, and it always produces a solution if it exists. Existing methods 
such as [10, 11] are mostly based on heuristics which cannot guarantee finding 
a routing solution even when one exists. The exact method of [12, 13] does not 
consider obstacles on the grid, and the exact method of [14] is for the problems 
with only 3 or 4 pins that lie on obstacle borders or on the border of the grid. 

Unlike the existing methods, our approach can be applied to some variations 
of RST routing with restrictions on the lengths of wires connecting the source pin 
to the sink pins. Such restrictions are needed to meet signal delay constraints. 
It also can be applied to other wire routing problems where pairs of pins are 
connected by wires that do not intersect with each other and that do not go 
through obstacles. In this sense, our method is more elaboration tolerant [15]. 

In the following, after describing RST construction, we provide a detailed 
description of the method mentioned above as it applies to RST routing and 
its variations. After we present some experimental results, we conclude with a 
comparison of our approach to the related ones, and a discussion of how our 
method applies to other wire routing problems mentioned above. The programs 
describing the RST routing domain and the RST routing problems below can 
be obtained from the first author. 

2 RST Construction 

We describe RST construction as a graph problem. Recall that a Steiner tree for 
a set S of vertices in a graph (V, E ) ( S C V) is a tree (V 7 , E') that is a subgraph 
of ( V , E) where S C V' and the total length of edges in E' is minimum. A 
Rectilinear Steiner Tree (RST) for a set S of vertices in a graph (V, E) (S C V) 
is a Steiner tree (V',E') for S in (V,E) such that the edges in E' are horizontal 
or vertical line segments. In this paper, the length of an edge is the number 
of segments contained in that edge. The problem of computing an RST is NP- 
complete [2]. 

In the computational problem we are solving, the input graph (V, E ) is a 
grid. Given a set S of points on this grid, our goal is to find a tree in the grid 
that connects all the points in S such that the total number n of unit segments 
covered by the tree is minimum, or to determine that there does not exist such 
a tree. Note that since the edges E of the given graph are vertical or horizontal, 
the edges of the tree in this graph are also vertical or horizontal. Therefore, the 
trees we compute are RSTs. 

To make sure that all points in S are connected via a tree on the grid, we 
use the following proposition: 

Proposition 1 For any finite graph (V,E), and any set V' C V, the following 
conditions are equivalent: (a) there exists a subgraph of (V,E) with the set V' of 
vertices that is a tree; (b) there exists a vertex v € V' such that every vertex in 
V’ is reachable from v in V' . 
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(a) 



(b) 



Fig. 2. (a) An RST routing problem with the source pin s and 5 sink pins pO, . . . , p4. 
A solution to this problem is presented in (b). 

const maxX = 4. const maxY = 4. 



source (2 ,3) . 

sink(0 , 1 ,4) . sink(l,4,4). sink(2,4,2). sink(3,0,2). sink(4,2,0). 
obstacle (1 ,3,1) • obstacle (0,1,0) . 

Fig. 3. Input file for the problem from Figure 2(a). 



Proposition 1 allows us to specify one of the given points as the “source” 
point, and ensure that, for every other point (“sink” point), there is a path 
connecting it to the source point in the grid. When n is minimum, the union of 
paths connecting sink points to the source point forms an RST. 

In some variations of RST construction, we put restrictions on the lengths of 
paths between the source point and sink points. For instance, we can ensure that 
the length of each path connecting a source point to a sink point is at most l. 
In another variation, we can put restrictions on the lengths of specific paths. 

When we put restrictions on the lengths of paths, we may not find an RST. 
However, a smallest graph connecting the sink points and the source point that 
satisfies the given length restrictions is a tree: 

Proposition 2 Let G = (V, E) be a finite graph, and s be a vertex in V . Let V' 
be a subset ofV, and H = ( V . , E') be a connected subgraph of G. If the total length 
of edges in E' is minimum subject to the condition, for every vertex x € V' , the 
length of a path connecting x to s in H is less than some given number l x , then 
H is a tree. 



3 Input and Output of CMODELS 

In our approach, the solutions of the RST routing problems are characterized 
by the truth values of the atoms covered(S,X,Y) (“If S=h then the horizontal 
segment connecting the points (X,Y) and (X+l , Y) is covered by the graph; if S=v 
then the vertical segment connecting the points (X,Y) and (X,Y+1) is covered 
by the graph”). 

Consider, for instance, the problem shown in Figure 2(a). This problem is 
described to CMODELS by the file presented in Figure 3. The size of the grid is 
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represented in that file by the constants maxX and maxY. These numeric values 
are defined in each particular routing problem. The source pin and the sink pins 
are specified. Next, we describe the shape of the obstacles in this example by 
obstacle (1 , 3 , 1) and obstacle (0 , 1 , 0) . Here, obstacle(Xl,X2,Y) expresses 
that there is an obstacle occupying the points covered by the rectangle defined 
by the points (XI, Y), (X2,Y), (Xl,Y+l),and (X2.Y+1). 

To find a solution to this problem we need the file rst.lp, describing the 
RST routing domain, and the file point. Ip, describing the grid points that are 
not covered by obstacles. Parts of file rst . Ip are discussed in the next section. 
Given the files rst . Ip and point . Ip with the file presented in Figure 3, and an 
upper bound n=12 on the total wire length, CMODELS finds the following tree: 

Answer set: 

covered(h,0,3) covered(h, 1 ,3) covered(h,2,0) covered (h, 2, 3) 
covered(h,3,0) covered(h,3,3) covered(v,0,2) covered(v, 1 ,3) 
covered(v,4,0) covered(v,4, 1) covered(v,4,2) covered(v,4,3) 

shown in Figure 2(b). CMODELS also determines that there is no tree whose total 
length is less than 12. It follows that the tree above is a solution. 

4 The RST Routing Domain 

As described in Section 2, we construct RSTs from paths connecting sink points 
to the source point. Every path is characterized by the truth values of the atoms 
in(S,N,X,Y) (“If S=h then the horizontal segment connecting the points (X,Y) 
and (X+l , Y) occurs in Path N a path connecting sink pin N to the source pin; if 
S=v then the vertical segment connecting the points (X,Y) and (X,Y+1) occurs 
in Path N” ) . 

In the file rst.lp, first sets of atoms of the form in(S,N,X,Y) are “gener- 
ated” by the choice rule 

{in(S,N,X,Y)} :- segment(S), path(N) , point(X,Y). 

where point (X,Y) defines, in the file point. Ip, the grid points that are not 
blocked by obstacles. Then these sets are “tested” with some constraints ex- 
pressing the following: 

(i) the set describes a subgraph of the grid, 

(ii) the subgraph contains paths connecting the sink points to the source point, 
and 

(iii) the size of the subgraph is minimum. 

Due to Proposition 1, condition (ii) expresses that the subgraph contains a tree 
connecting the sink points and the source point. With condition (iii), this sub- 
graph forms a solution to the RST routing problem. 

First, for (i), we eliminate the sets that contain vertical segments connecting 
a point (X,Y) on the grid that is not blocked by an obstacle to the point (X,Y+1) 
that is blocked by an obstacle (or is above the upper edge of the grid): 
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in(v,N,X,Y), path(N) , point(X,Y), not point (X,Y+1) . 

A similar constraint is added for horizontal segments. 

Next, for (ii), we eliminate the sets that do not contain paths connecting the 
sink points to the source point by adding the following constraints. 

To express that the end points of a path should be in that path, we define 
the atom at(N,X,Y) (“the point (X,Y) is in Path N”). We express that the end 
points of Path N, specified in a problem description file, should be in Path N by 
the constraint: 

not at (N,X,Y) , ends(N,X,Y). 

We need to make sure that the end points of Path N cannot be connected 
to two or more points in the path, whereas each of the other points of Path N 
should be connected to exactly two points in the path. For that, we define the 
atom at (N , X , Y , D) (“the unit segment that begins at the point (X,Y) and goes 
in the direction D occurs in Path N” ) . We make sure that the end points of Path 
N cannot be connected to two or more points by the constraint 

2{at(N,X,Y,D) :direction(D)}, 
path(N) , ends(N,X,Y) . 

Each of the other points of Path N cannot be connected to exactly one point 

l{at (N,X,Y,D) :direction(D)}l, 
path(N) , point(X,Y), not ends(N,X,Y). 

and cannot be connected to more than three points. 

3{at(N,X,Y,D) :direction(D)}, 
path(N) , point(X,Y), not ends(N,X,Y). 

That is, each of these points should be connected to exactly two points. 

Finally, for (iii), we eliminate the sets where the total wire length is larger 
than n by adding the constraint 

n+1 {covered(S ,X,Y) : segment (S) : point (X,Y)} . 

where covered(S,X,Y) is defined as 

covered(S,X,Y) in(S,N,X,Y), segment(S), path(N) , point(X,Y). 

If the total wire length n is minimum then the graph generated by the pro- 
gram above is an RST. For instance, with the routing domain described above, 
CMODELS finds the tree presented in Figure 1 with n=76. For n=75, CMODELS 
determines that there is no answer set for the program; therefore, the tree pre- 
sented in Figure 1 is an RST. 

5 Restricting the Lengths of Wires 

An RST routing problem may involve constraints on the lengths of some wires 
connecting the sink pins to the source pin, to meet some signal delay constraints. 
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(a) (b) 

Fig. 4. (a) An RST routing problem with 7 sink pins, along with a solution where 
the total wire length is minimum, and the length of each wire is at most 13. (b) A 
near-solution to the RST routing problem in (a) where the length of Path 0 is required 
to be at most 8. 



We can express that any wire cannot be longer than a specific value, say 1, by 
adding to the problem description the constraint 

1+2 {at(N,X,Y) : point (X ,Y) } , path(N) . 

In other words, no path connecting the source pin to a sink pin covers 1+1 unit 
segments on the grid. Consider, for instance, the problem shown in Figure 4(a), 
along with a solution. In this solution, the total wire length is minimum (n=26), 
and each wire length can be at most 13 (1=13). There is no solution for this 
problem with n=26 and 1=12. 

We can put restrictions on the length of a specific wire as well. For instance, 
for the problem described in Figure 4(a), we can express that the wire corre- 
sponding to Path 0 cannot be longer than 8 by adding to the problem description 
the constraint 

8+2 {at(0,X,Y) :point(X,Y)>. 

After this change, a tree can be found by CMODELS as shown in Figure 4(b). 
Here, the total wire length is 28, and each wire length is at most 13; there is no 
solution with the total wire length being less than 28. 

When we put restrictions on the lengths of wires, we may not find an RST. 
For instance, when we restrict the length of Path 0 above, a connected graph 
with minimum total wire length n=26 does not exist. However, due to Propo- 
sition 2, we know that a smallest graph connecting the sink pins to the source 
pin that satisfies the given length restrictions is a tree. For instance, the graph 
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(a) (b) 

Fig. 5. (a) A cyclic graph and (b) a tree connecting the sink pins pO, ...,p6 to the source 
pin s. 



in Figure 4(b) satisfies the restriction on the length of Path 0, and there is no 
such graph that has a smaller total length (n<28). 

6 Constructing Trees 

If the upper bound n for the total number of the segments covered by the paths is 
not tight enough then the graph generated by the program described in Section 4 
may not be a tree, i.e. , there may be cycles as in Figure 5(a). Since a path 
connecting a sink point to the source point does not contain any cycles, these 
cycles are of the two forms: either they do not contain any of the sink points 
and the source point, or they are formed by different paths. In Figure 5(a), we 
can see both kinds of cycles. 

To make sure that the graph generated by the program described in Section 4 
is a tree, due to the definition of a tree, we need to make sure that every point 
covered by the graph is reachable from the source point via exactly one path. For 
that, we add to the program describing the RST routing domain the constraint 

at (N,X,Y) , at(Nl,X,Y), not reachable (N,N1 ,X,Y) , 
point (X,Y) , path(N;Nl) , N<=N1. 

Here reachable(N,Nl,X,Y) expresses that every point (X,Y) covered by Paths 
N and N1 is reachable from the source point via the segments covered by both 
paths, and it has a straightforward recursive definition. The base case is defined 
by the rule 

reachable (N,N1 ,X,Y) source(X,Y), path(N;Nl), N <= Nl. 
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and the inductive step is defined by rules like 

reachable (N ,N1 ,X+1 , Y) reachable (N,N1 ,X,Y) , path(N;Nl), N <= Nl, 
in(h,N,X,Y) , in(h,Nl,X,Y) , point (X,Y), point (X+l ,Y) . 

In the case of N = Nl , the constraint above expresses that every point covered 
by a path is reachable from the source point. This eliminates the cycles of the 
first form, and makes sure that the graph is connected. 

In the case of N ^ Nl, the constraint expresses that a point covered by two 
different paths is not reachable from the source point via two different paths. 
This eliminates the cycles of the second form, and, since a path connecting a 
sink point to the source point does not contain any cycles, it makes sure that, 
in a connected graph, paths do not form any cycles. 

For instance, for the problem presented in Figure 5(a), if the upper bound 
on the total wire length is set to 40 then, with the constraint above, CMODELS 
computes the tree in Figure 5(b). If the total wire length is minimum (n=26) 
then CMODELS computes the RST in Figure 4(a). 

Note that the program describing the routing domain in Section 4 is a “tight” 
program, and it becomes “nontight” with the definition of reachability above [16]. 

7 Experimental Results 

In the computational problem we are solving, the input consists of a program 
describing the locations of the pins and the obstacles on a grid (Section 3), the 
program describing the RST routing domain (Section 4), and an upper bound 
n on the total wire length. Given this input, CMODELS computes a graph that 
connects the pins without going through the obstacles where the total wire length 
is at most n, or determines that such a graph does not exist. Our goal is to 
compute an RST (Section 2), if it exists. For that, first we make sure that the 
given upper bound is not too small so that CMODELS can compute a solution, 
and then we call CMODELS on the given input programs with decreasing values of 
the upper bound until we reach the minimum total wire length. (Alternatively, 
binary search can be used.) We reach the minimum total wire length n, when 
CMODELS computes a solution for the upper bound n, and determines that a 
solution does not exist for the upper bound n — 1. 

To make the computation more efficient, we introduce two “circles” with a 
given radius around the endpoints of a path we are looking for, and require that 
the path be contained in the union of these circles. This modification sometimes 
improves the computation time of CMODELS significantly. For instance, for the 
problem shown in Figure 1, introducing circles of radius=8 around the endpoints 
improves the computation time of a solution (n=76) by a factor of 10. However, 
adding circles around endpoints can prevent CMODELS from finding a solution 
to a solvable problem if the value of radius is too small. For instance, for the 
problem above, if we reduce radius by 1 then we cannot find a solution. For 
this reason, when we reach an upper bound, for which there is no solution with 
circles, we continue the process of computing an RST without introducing any 
circles. 
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Problems # of sink pins n CPU time 



A 


15 


58 

57* 


14 

67 


B 


20 


68 


17 






67* 


128 


C 


25 


74 


21 






73* 


450 


D 


30 


76 


23 



75* 164 

Fig. 6. Computation times for problems A , . . . , D on a 15 x 15 grid when the total wire 
length (n) is minimum. 



Another way to make the computation of an RST more efficient is to prohibit 
“adjacencies” of unit segments. (Two unit segments are adjacent if they form the 
opposite sides of a unit square.) This is possible due to the following proposition. 

Proposition 3 Let (V, E) be a rectilinear tree where each edge is a unit seg- 
ment. Then there exists a rectilinear tree (V, E ') without any adjacencies of unit 
segments such that \E'\ = | E\. 

With no adjacencies, for instance, the computation time of a solution (n=76, 
radius=8) for the problem in Figure 1 can be improved by a factor of 7. 

Given the programs describing a specific routing problem and describing the 
RST routing domain, with n and radius specified, CMODELS transforms the 
programs into a propositional theory [17], and calls a SAT solver to compute 
the models of this theory, which are identical to the answer sets for the given 
programs 3 . In our experiments, we use CMODELS (Version 2.01) and lparse 
(Version 1.0.13), with the SAT solver ZCHAFF (Version Z2003.ll.04) 4 . All CPU 
times presented below are in seconds for a PC with a 733 MHz Intel Pentium 
III processor and 256MB RAM, running SuSE Linux (Version 8.1). 

We consider 4 problems A , . . . , D on a grid of size 15 x 15 with 15, 20, 25, 
30 sink pins respectively. Note that the RST problems in VLSI routing typically 
have small number of sink pins (e.g., less than 10), so our test problems are of 
reasonable size for that application. Figure 6 shows the computation times for 
these problems, when radius=8. This value of radius is small enough to find 
a solution to each problem with the minimum total wire length. The character 
* denotes that the problem does not have a solution. For instance, problem A 
does not have a solution where the total wire length is 57; CMODELS finds this 
out in 67 seconds, without introducing any circles. 

When we relax the restriction on the total wire length to compute a tree 
(as in Section 6), the computation times sometimes decrease. For instance, for 
problem B, a solution with the minimum total wire length (n=68) is computed 
in 17 seconds. If we allow the total wire length to be at most 85, a tree of size 
79 is computed in 7 seconds. 

3 For nontight programs, CMODELS operates with a “generate and test” approach [18]. 

4 http : //www. ee .princeton. edu/~chaf f /zchaf f .php 
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Problems # of sink pins 


n 1 


CPU time 


A 


15 


58 16 


17 






15* 


16 


B 


20 


68 19 


40 






18* 


59 


C 


25 


74 17 


32 






16* 


40 


D 


30 


76 21 


69 






20* 


82 



Fig. 7. Computation times for problems A, . . . , D on a 15 x 15 grid when the total wire 
length (n) is minimum, and each wire length is at most 1. 

Figure 7 shows the computation times for problems A,...,D when the length 
of each wire connecting a sink pin to the source pin is bounded by 1. The 
results are for radius=8. Compared to the results in Figure 6, we can see that 
restricting the length of each wire by 1 sometimes increases the computation 
time. For instance, for problem D , computing a solution with n=76 takes 23 
seconds whereas a solution with n=76 and 1=21 takes 69 seconds. 

Another answer set solver that we can use to solve the routing problems with 
our formalizations above is SMODELS. However, solutions to the problems above 
can be computed faster using CMODELS. For instance, a problem with 7 pins on 
a 10 x 10 grid can be solved by CMODELS in less than a second whereas a solution 
can not be found in less than a minute with SMODELS (Version 2.27). 

One way to extend our approach to problems with larger grid size is to 
consider a small set of points on the grid that would be sufficient to construct 
a tree. These points can be identified by the following hierarchical process in 
a multi-level manner. At the first level, we consider the given n x n grid as 
an ( n/m ) x (n/m) small grid. Each point of this small grid represents a m x m 
subgrid of the given grid. We construct an RST on this small grid using CMODELS, 
and obtain the points of the given grid covered by this RST. We repeat this 
process by decreasing m at each level until we obtain a small set of points on 
the grid that would be sufficient to construct a tree. Note that a tree computed 
by this process is not guaranteed to have the minimum total length. We have 
implemented the algorithm above as a perl program, and experimented with 
two problems E and F on a 100 x 100 grid with 5 and 10 pins respectively. For 
each problem, first we have obtained a subgraph of the grid that is sufficient to 
find a tree that connects the given pins, by considering the given grid first as 
a 10 x 10 grid, then as a 20 x 20 grid, then as a 25 x 25 grid, and finally as a 
50 x 50 grid. This process takes 20 seconds for problem E and 36 seconds for 
problem F. After that, we have computed some trees connecting the given pins 
over these subgraphs. Figure 8 shows these computation times when radius=30. 
For instance, we can compute a tree of size 157 in 75 seconds for problem F. 

8 Discussion 

We have introduced a formal method for RST construction, in the context of 
VLSI routing, using answer set programming. This method provides a concise 
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formal representation of the problem as a logic program whose answer sets corre- 
spond to solutions. The answer sets for the given formalism are computed using 
the answer set solver CMODELS. 

Our method always correctly determines whether a given RST routing prob- 
lem, possibly with the presence of obstacles, is solvable, and finds a solution if 
one exists. The two other exact methods for RST routing are presented in [14] 
and [12, 13]. In [14], the authors consider RST problems with 3 or 4 pins. They 
identify the possible topologies, and then describe exact algorithms for each 
topology. Our method is more general in that any number of pins can be con- 
nected by an RST without having to identify all possible topologies. In [12, 13], 
the authors do not consider obstacles. The idea for RST construction is first 
to generate some full Steiner trees (Steiner trees where the source and the sink 
points are the leaves of the tree) for subsets of the given pins, and then to con- 
struct a Steiner tree from a subset of these full Steiner trees. Our method is more 
general in that RST construction in the presence of obstacles can be handled. 

Other existing methods for RST construction such as [10, 11] are based on 
heuristics which cannot guarantee finding a routing solution even when one ex- 
ists. 

Another difference of our method from the ones mentioned above is that it 
allows us to solve variations of RST routing where some restrictions are put on 
the lengths of paths connecting the sink points to the source pin. For instance, 
we can ensure that the length of each path connecting a source pin to a sink 
pin is at most l. In another variation, we can put restrictions on the lengths of 
specific paths. In such cases, our method guarantees that we find a tree even 
when the total wire length is not minimum. 

In some routing problems, the goal is to connect pairs of pins with wires, 
and the solutions consist of paths that are required not to intersect with each 
other, and that do not intersect with obstacles. For instance, a routing problem, 
along with its solution computed by CMODELS, is displayed in Figure 9. Such 
problems can be solved with our formalization (Section 2), by adding to the 
problem description the constraint 

at (N,X,Y) , at(Nl,X,Y), N<N1, path(N) , path(Nl) , point (X,Y). 

expressing that no two paths intersect 5 . As in the case of RST routing problem, 
variations of these routing problems, in which lengths of wires and distances 
between them come into play, can easily be represented by slight modifications. 

5 For routing problems, where pairs of pins are connected, some other ASP repre- 
sentations are presented in [19-21], and one is due to Tommi Syrjanen (personal 
communication, July 31, 2000). With the formalization above, such routing prob- 
lems can be solved more efficiently. For instance, a problem with 20 pairs of pins on 
a grid of 15 x 15 can be solved in 21 seconds with our program, using about 80MB 
of memory. With the encoding of [20], the computation time is 57 seconds and the 
used memory is about 300MB. A solution for this problem cannot be found in less 
than a minute with the other formalizations. In our formalization, paths are defined 
in terms of segments, and the grid does not have to be rectangular. This allows us 
to solve RST routing problems avoiding cycles and with a hierarchical approach. 
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Problems # of pins n CPU time 

E 5 III 16 

115 9 

119 6 

F 10 157 75 

163 64 

170 32 

Fig. 8. Computation times for problems E and F on a 100 x 100 grid when the total 
wire length is at most n, using hierarchical routing. 




Fig. 9. A routing problem where pairs of pins are connected. 



For instance, we can solve bus routing problems where all wires connecting pairs 
of pins should be of the same length so that the signal delays through all wires 
are equal. In another enhancement, we can prohibit the adjacency of wires, to 
avoid signal interferences. In this sense, this representation method is elaboration 
tolerant. 
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Abstract. We investigate the methodology of utilizing domain dependent knowl- 
edge in solving the planning problem in answer set programming. We provide 
a classification of domain dependent knowledge, and for each class, a coding 
scheme. In this way, domain dependent knowledge can be encoded into an exist- 
ing program. Experiments are conducted to illustrate the effect of adding domain 
dependent knowledge for benchmark planning problems, which show that adding 
domain dependent knowledge in many cases substantially improves the search ef- 
ficiency. 



1 Introduction 

The planning problem is to find a sequence of actions that leads from the initial state 
to the goal state. The problem is computationally difficult. It is PSPACE-complete to 
determine if a given planning instance has any solution [6]. By fixing the length of 
plans, the complexity reduces to NP-completeness. 

As a recent addition to the paradigm of declarative programming, answer set pro- 
gramming (ASP) under the stable model semantics [9], has been used to solve the plan- 
ning problems [4. 14, 15, 18, 21], where a planning problem is represented by a program 
specifying the constraints and rules that must be satisfied by any plan, and each answer 
set corresponds to a solution to the given planning problem. 

Although the methodology for solving the planning problem in ASP is generally 
well-understood, the methodology for extracting and encoding domain dependent 
knowledge is not. The work in [20] showed how procedural knowledge expressed by 
temporal formulas can be encoded into answer set planning. In general, however, the 
questions remain for anyone who chooses to use ASP to solve a planning problem: what 
types of domain dependent knowledge one should look for. how each type may be en- 
coded into an existing program, and what kind of pitfalls to avoid. The work reported 
here is intended to serve as an important step in finding answers to these questions. 
There is a growing belief that domain dependent knowledge could be the key to future 
performance gains [ 15, 23]. 

To utilize domain dependent knowledge, one needs to extract knowledge from the 
planning domain. A classification of the knowledge can serve as a guidance as how 
domain dependent knowledge may be extracted. We are aware of no previous work in 
the literature that classifies domain dependent knowledge for planning, especially in the 
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context of ASP. In this report we first classify domain dependent knowledge into five 
categories. In addition, a coding scheme in ASP is provided for each class. The idea is 
that domain dependent knowledge is encoded into an existing solution either by adding 
new constraints or by strengthening the conditions of some action rules. 

Domain dependent knowledge has been incorporated into a number of systems, e.g., 
CPlan [22], TLPlan [1], and TALplan [5]. Our approach differs from these works. First, 
domain dependent knowledge is not encoded into the underlying search algorithm, as 
was done in CPlan. That is, the information we use does not make references to the 
underlying planning algorithm. Second, in contrast with TLPlan and TALplan, we do 
not build a new language for the encoding of domain dependent knowledge. In our case, 
the knowledge is encoded as constraints into the original program in the same language. 

The next section reviews answer set programming for planning. Section 3 classifies 
domain dependent knowledge. We then begin to present and discuss experimental re- 
sults on five benchmark planning problems, each with some unique characteristics. In 
Section 4, one can find the experiment setup and a description of the benchmarks. Sec- 
tion 5 presents and summarizes experimental results, with Section 6 commenting on the 
related work and future directions. The programs and test data used in our experiments 
can be found at http://www.cs.ualberta.ca/~you/thesis-xiumei/Program.htm. 

2 Answer Set Programming for Planning 

In this paper, ASP refers to logic programming under the stable model semantics. A 
logic program consists of rules of the form A <— Bi , ..., B m , not C i, ..., not C n ., 
where A, Bi and Cj are function-free atoms. Such a rule is called a basic rule. The 
head A may be omitted, in which case it serves as a constraint where the body must 
be false in any stable model. In systems like Smodels and DLV these programs are 
instantiated to ground instances for the computation of stable models. The definition of 
stable model is given in [9] . 

ASP has been extended with new constructs [17], in particular 
choice rule : {hi,...,h m } <— body. 

cardinality constraint : L{a i, ..., a n , not bi , ..., not b m }U 

In a choice rule, if the body is satisfied then any subset of the head may be included in 
a stable model. In a cardinality constraint, L and U are integers giving the lower and 
upper bounds of the constraint, respectively. It is satisfied by a model if the cardinality 
of the subset of the literals satisfied by the model is between the integers L and U, 
inclusively. A cardinality constraint can be used just like a literal anywhere in a rule. 

To solve a planning problem in ASP, we need to represent: (a) fluents and actions', 
(b) initial and goal states; and (c) an action system. The fluents and actions are repre- 
sented by predicates in logic programs. For actions and state-related fluents, we add a 
state parameter T to explicitly represent in which state the fluents and actions are about. 
To represent an action system for planning, four groups of rules are needed. 

- Action choice: which actions should be chosen; 

- Affected objects: what objects are affected by an action; 

- Effects: if affected, what are the effects on the affected objects; and 

- Frame axioms: if not affected by any action at a state, the fluents that hold at the 
current state remain to hold in the next state. 
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An action system can be realized in ASP by the following encoding. In the first 
two rule schemes below, action choice is represented by a choice rule, along with a 
constraint preventing an action and its conflicting actions from happening at the same 
state (such as moving a block to two distinct locations at the same state). In the next 
two rules, affected object is identified and change of property specified. Finally, a frame 
axiom formulates that a property of an object holds at the current state will continue to 
hold at the next state if the object is not affected by any action at the current state. 

{action(Obji , ..., Obj n , T)} 4 — preconditions . 

4— action(Obji , ..., Obj n ,T), conflicting Action(Obj \, ..., Obj m , T). 

af fected(Obj,T) 4— action(T). 

property 2 (Obj ,T + 1) <— actioniT) , property i(Obj ,T) . 

property (Ob j,T + 1) 4 — not af fected(Obj,T),property(Obj,T). 



3 Classification of Domain Dependent Knowledge 

In this section, a classification of the nature of domain dependent knowledge is pro- 
posed. If the domain dependent knowledge does not rule out any optimal plan (mea- 
sured by the number of actions in this paper) for the underlying planning domain, we 
call the knowledge optimality knowledge (adapted from [12]). All the domain depen- 
dent knowledge discussed here is optimality knowledge. 

We often refer to the gripper problem [16] for illustration. The goal of this problem 
is to transport all the balls from one room to the other by a robot. To accomplish this, 
the robot is allowed to move from one room to the other, and use its two grippers to pick 
up and put down balls. Each gripper can hold one ball at a time. Here, parallel actions 
are allowed: two grippers could pick up or put down balls at the same state 1 . 



3.1 End State Knowledge 

The end state knowledge (ESK) is the domain information about the initial state and 
goal state. It extracts static information about these states. 

With the knowledge about the initial or goal properties of an object, taking an action 
on this object may be redundant. For example, in the gripper problem, if a ball is already 
in the goal room, then pickup will be a redundant action for this ball. However if the 
ball is on a gripper and the robot is not in the ball’s goal room, putdown is a wasted 
action. ESK can be formulated by the following rule: 

if % then not Action (1) 

where represents initial or goal properties of an object which holds at the current state, 
Action represents the action that affects x- This ensures that an action not to occur at a 
state under the condition of x at this state. 

1 Encoding that allows parallel actions is generally simpler, since there is not need to restrict 
that only one action at a time is allowed. 
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To encode ESK into the original program, we simply put not \ into Action's 
preconditions. The only rules that are affected by ESK are action rules. Thus, we have 
the coding scheme for ESK: 

HeadAction(Obj , T) <— preconditions , not X- (2) 

where H eadAction{Obj , T) denotes the head of the corresponding action rule in the 
original program. 

3.2 State Relationship Knowledge 

During the construction of a plan, the relationships among the properties of two or more 
states are good sources of domain knowledge. We identify two subclasses here. 

Relationship about state properties. A relationship among the properties that concern 
multiple states is established by a sequence of actions. Such a relationship should not 
exist if the action sequences that construct it introduce redundant actions to plans. 

For example, for the gripper domain, the effect, “return a ball back to its orig- 
inal room”, can be expressed by a relation among three properties: {at(A, R, T 0 ), 
not at(A, R } T 0 + 1), at (A, R, Ti)} where T\ > To + 1, and at(A, R , T) means A is 
in room R at state T. This relationship can be formed, among others, by the following 
sequence of actions: gripper G picks up ball A at T, moves to another room at T + 1, 
moves back to the original room at T + 2, and puts ball down at T + 3. Destroying such 
a relationship will prune all the action sets that construct the relationship. This type of 
knowledge can be formulated as well as encoded by a a constraint, <— xi, ..., Xn, where 
Xi, Xn together construct the “bad” relationship. 

Relationship among state conditions and actions. Another subclass of SRK concerns 
the relationships among state conditions and actions. The SRK of this subclass is the 
same as ESK described earlier except it may not be related to end states. We will there- 
fore refer to Rules 1 and 2 for its formulation and encoding. E.g., one piece of SRK in 
the blocks world, in the form of Rule 1 , is 

if not goodTower(Y,T) then not move(X, Y,T) 

which says that if block Y is not on a good tower at T, do not move X onto Y . A good 
tower is one in which the blocks are already at their goal positions. According to the 
encoding for ESK (cf. Rule 2), we can define a predicate goodTower(X, T) and add it 
as a condition into the action rules that move a block onto another block. 

3.3 Procedural Knowledge 

Procedural knowledge (PK) extracts sequential information from the sequence of ac- 
tions in final plans. This type of knowledge is powerful because it provides a shortcut 
for deriving new actions directly from “existing” ones. It can also provide extra con- 
straints among actions to guarantee the right sequence in the final plans. 

We identify four subtypes of PK that frequently appear in planning domains. First, 
we single out procedural knowledge with unknown objects (that occur in action pred- 
icates). E.g., boarding a bus at one location should precede unboarding it at another 
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location. The latter location may be unknown from the planning domain. On the other 
hand, the object bus is known to refer to the same bus. 

For PK where the objects are all known, we distinguish the knowledge with known 
state distances from the one with unknown state distances. In addition, any procedural 
knowledge can be total or partial. While the latter specifies a strict order among n 
actions, total knowledge in addition requires that when any n — 1 actions occur at 
their specified positions in a sequence, the remaining nth action must also occur in the 
sequence at its specified position. 

In general, we need to consider orderings among not only single actions but sets of 
actions. We now give the details. 

Total procedural knowledge with known state distances. Let action,. (Ti) be an ac- 
tion that occurs at state Tj. The notation, c : actiorii(Ti ) -< k actiorij(Tj), expresses 
that under the condition c, once either action occurs in a plan, the other must also occur 
in the same plan; and in the plan actiorij must occur k states later than actiorii, where 
k is a numerical expression. 

Notation: n(actioni(T) \ ■ ■ ■ \ actioni(T))m denotes that, of all the i actions, l 
actions in the list may occur at state T where n < l < m. 

Total procedural knowledge with known state distances can be formulated by 

c: Xi(Ti) -<fci, 2 X 2 (T 2 ) 3 ••• ,A fen _ ln Xn(T n ) (3) 

where Xj(T,) i s °f the form action^Ti) or n(action\{T) | • • • | action m (T))m. This 
rule says that under the condition represented by c, once any n — 1 elements in the list 
occur in a plan in the specified order, the nth element must occur in the same plan and 
all the elements must occur in the specified order with said state distances. 

To encode the above rule in ASP, for each element ;\y (7j), we construct the follow- 
ing logic program rule: 

Xi(Ti) <- xi(J 1 i).---,x'i-i(2i-i),X , i +i( T i+i)--'',X^(I 1 n), 

c, T 2 — T% = ki t2 , ..., T n — T n _ i = k n - i >n . (4) 

where if Xi(Ti) is of the form actiorii(Ti), x\ = Xi> if Xi is °f the form 
n(actionj 1 (T) | • • • | actionj q (T))m, %' = nlactionj^T), ■ ■ • , action j (T)}m. 

Partial procedural knowledge with known state distances. Partial procedural knowl- 
edge with known state distances can be formulated by 

c : xiCTi) c fel 2 X2 (T 2 ) Cfc 2i3 , ..., C fcll _ liB Xn{Tn) (5) 

where X j{Tj) is of the form actiorii(Ti) or n{action\(T)\ ■ ■ ■ \ actioni(T))m. 
is a number representing the state distance of Tj and T,+ \ . This rule says, under the 
condition represented by c, (1) if all the actions in the sequence occur in a plan, their 
state distances must be those specified in the rule; and (2) if, for all j, 1 < j < i , all 
actions represented by Xj(Tj) occur at state 7j in a plan in the sequence specified by 
the rule, then action(s) Xi+i^i+t) must occur in the same plan kij + 1 states later. 

Partial PK provides a single directed dependency relation between sets of actions. 
The action(s) at the right side of [I depend on all of the action(s) at the left side, but the 
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converse need not be true. This relation is useful for pruning search space in planning. 
Once the left side action(s) occurs in a plan, the right side action(s) can be directly added 
into the same plan. As an example, the following partial PK in the gripper problem 

(goalRoom.(R, 2 ), at(robot, Ri,Ti)) : pickup(G 1 A,Ti) IZi move(robot, Ri, -R 2 ,T 2 ) 

says that if the robot picks up ball A at state Ti, it must move to the ball’s goal room at 
the next state. However, it is not always true that if the robot moves to the goal room it 
must pick up A at the previous state, since this room is the destination of many balls. 
Partial PK can be encoded as: for each Xi{Ti) where 2 < i < n, construct the rule 

Xi( T i) <- c,xi(Ti), ...,x , i _i(T i _i),T 2 - Ti = k h2 ,-,Ti - (6) 

where x'i(Ti) = Xi(Ti) if Xi(Ti) is of the form actionfTj), and if Xi(Ti) is of the form 
n{action\(T)\ ■ ■ ■ \actioni(T))m then x'i(Ti) = n{action\(T ), • • • , actioni(T)}m. 

Procedural knowledge with unknown state distances. This can be expressed by 
Rule 5 or Rule 3 without subscripts under the symbol for ordering. Suppose, if actioni 
and action 2 occur in a plan then actioni should occur before action 2 . Since the dis- 
tance is unknown, given actioni there could be many candidate states for action 2 . 
Although one can use a choice rule with the set of actions in all candidate states as 
the head, the number of candidate states will multiply when we specify a sequence of 
such actions. Encoding by constraints is a good choice in this case, especially for two 
(sets of) actions Xi(Ti) and Xj(Tj), where Xi(Ti) should precede Xj(Tj) under the 
condition c: 

Xi(Ti), Xj(Tj ), c, Ti > Tj. 

where Xi{Ti) = Xi{Ti) if it is of the form actioni{Ti), if Xi(Ti) is of the form 
n{action\(T)\ ■ ■ ■ \actioni(T))m, x!i{Ti) = n{actioni{T ) , • • • , actioni(T)}m. 

Procedural knowledge with unknown objects. From the coding point of view, the 
problem of capturing unknown objects in action predicates is identical to the one of 
capturing unknown state distances. Thus, the discussion for PK with unknown state 
distances applies. 

3.4 Symmetry Knowledge 

Symmetry knowledge (SK) is the knowledge of checking and breaking symmetries. To 
capture symmetry knowledge, we must discover symmetric objects and break them. 

Definition 1. [22] 

Two plans Mi and M 2 are isomorphic if Mi can be obtained from M 2 by exchanging 
an object e with another object e' in M 2 . Objects e and e' are called symmetric objects. 

Symmetry discovery. In ASP for planning, two objects obj 1 and o6j 2 in a planning 
domain are symmetric at the current state T if they have the following properties: 

1. The properties of obj 1 and o6j 2 that are required to hold at the goal state must be 
the same, and must not hold at the current state. 
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2. They are from the same domain (that is, of the same type). 

3. Their properties which hold at the current state T are the same. 

4. The same action cannot be performed on obj i and obj 2 at the same state, but the 
different orders (of the action working on them) will derive isomorphic plans 2 . 

In the gripper problem for example, there are two classes of objects: grippers (left 
and right) and balls. Both of these two types of objects are possible symmetric objects. 
The balls A and B are symmetric objects when: 

1. The properties which are required to hold at the goal state are the same: staying in 
the goal room. At the current state, they are both in the initial room. 

2. They are of the same type (both are balls). 

3. Their properties at the current state are the same: they are located in the same room, 
the initial room, not being held by any grippers. 

4. Action pickup(Gripper, Ball,T) cannot perform on two balls at the same time 
since one gripper can hold only one ball at a time. However, the plan where the 
gripper first picks up ball A is isomorphic to the plan where the same gripper first 
picks up ball B. 

To detect symmetry, a new predicate is constructed for each type of symmetric ob- 
ject. In the rule scheme below, the body consists of the properties 1-3 for obj\ and ob] 2 - 
The property number 4 cannot be encoded using predicate or inference rules. It depends 
on the judgment of the user. 

symmetric(objType , A, B , T) <— state(T), objType(A), objType(B), 
goal -properties(A), goal -properties(B), 

current-properties(A, T), current-properties(B , T). (7) 

The properties of A and B which hold at the current state T must not be the same as 
their properties that are required to hold at the goal state. 

Take the object type ball in the gripper domain as an example. We construct the 
following rule to detect symmetric objects: 

symmetric(ball , A , B , T) ball(A ), ball(B), stateiT) , init.Room.(Ri) , R\ ^ i? 2 , 

goalAtjA , R 2 ), goalAt(B, R 2 ), at(A , R\,T), at(B , Ri,T). 

where ball(A) and ball(B) correspond to property number 2. Since there are only 
two rooms, R 2 which is not the initial room must be the goal room. initRoom(R) , 
at(A, R, T) and at(B , R, T) are properties that hold at the current state T for A and 
B. goalAt(A, R 2 ) and goalAt{B 1 R 2 ) are properties that must hold at the goal state. 
Since ball A and B are in the initial room, their properties that must hold at the goal 
state do not hold at the current state. 

Symmetry breaking. There are several ways to break symmetries. Symmetry breaking 
during search [10] is a recent development which adds constraints during search, to en- 
sure that any assignment symmetric to the one already considered will not be explored. 
This approach requires a modification of the underlying planning algorithm. 

2 This property is essential, since if o&jiand obj 2 can be "performed” by the same action at the 
same state, we do not need to differentiate them. 
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Another way to break symmetry is to add constraints to a program in order to force 
symmetric solutions into one [8. 19]. This is the approach adopted in our work. First, 
symmetric objects form an ordered list. Then, we ensure that the relevant action rules 
are applied only to the least element among all symmetric objects. 

New predicates are constructed to assign each symmetric object an order. Sup- 
pose ai, 02 , • • • , a n is a list with elements from domain objType. Such a list can 
be represented using the predicates: head(objType,a i), list(objType,a, 2 ,a{), . . ., 
Ust(objType, a n , a„_i), where head(objType , ai) represents that the list contains the 
objects of type objType and the head of the list is a±, list(objType, a 2 , ai) tells that 
in the list a 2 follows a\, and so on. Under this representation, every object in the list 
can be given a number as its order, as done by the following rules: 

order (objType, A, 1) <— head(objType , A). (8) 

order (objType, B, N + 1) <— Ust(objType , B, A), order (objType, A, N ). (9) 

Rule 8 initializes the order of the head element in the list as 1, and Rule 9 gives an in- 
cremental number to the element in the list and the last element gets the largest number. 
These rules guarantee that each object in the list gets a distinct number. 

To get the object with the least number in a list, the following rules can be used: 

greater(objType , A, B, T) <— symmetric(objType, A, B, T),Nb > N a , 

order(objType, A, N a ), order(objType , B, Nb). (10) 

nonLess(objType, 1, A, T) <— symmetric(objType, A, B, T), 

greater(objType,A,B,T),A^= B. (11) 

least(objType,l, A,T) <— not nonLess(objType,l, A,T), 

symmetric(objType, A, B,T). (12) 

To break symmetries, the least symmetric object should be chosen. For this, an extra 
condition can be added to the body of an action rule on symmetric objects as 

H eadAction(Obj , SymObj, ..., T) <— body, least(objType, 1, SymObj, T). (13) 

where SymObject denotes a symmetric object, HeadAction(Object, SymObject, 
..., T) and body denote the head and the body literals, respectively, of the original rule. 

If parallel actions are allowed on the same set of symmetric objects, or we allow 
more than one least element (sometime it is desirable not to break all symmetries; see 
Subsections 5.4 and 5.6), we need to construct additional rules to identify the least N 
objects. 

3.5 Distance Knowledge 

If the lower bound needed to change properties of an object (or all objects of a type) 
which hold at the current state to properties which hold at the goal state is less than 
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the available steps, then the current partial plan cannot lead to a plan within the given 
number of steps. 

In the following, Rule 14 expresses the computation of lower bound; to stop the 
search once the lower bound is greater than the available steps, we construct Rule 15 
where steps is the maximum length for plan. 

lowerBound(Obj, N, T) <— current-Conditions(Obj , T ) , goal -Conditions(Obj) , 
other-facts, N = formulaToComputeLower Bound. (14) 
<— lower Bound(Obj,N,T),N > (steps — T). (15) 



4 Experiments Setup and Benchmarks 



We have implemented a range of test domains to determine how simple it is to specify 
domain dependent knowledge using our encodings, and how effective the domain de- 
pendent knowledge is for pruning the search space in planning. In our empirical tests 
we run Smodels 2.26 on a Pentium IV 1.5GHz machine with 1GB RAM. 

We tested our encodings with five planning domains. The logistics domains [3], the 
blocks world domains [11] and the gripper domains are from the CPlan testbed. The 
elevator planning domains are from the AIPS planning competition in 2000 [2] while 
the ferry domains are from TLPlan distribution [1]. In our encoding of these problems 
parallel actions are allowed. 

Experiments are conducted as follows: for each type of domain dependent knowl- 
edge, and for each of the five benchmarks, we run instances without any domain de- 
pendent knowledge, and then the same instances with the encoding of the knowledge 
added. In this way, we could see the effect of each type of domain knowledge. We 
sometimes had to design specific instances where a particular type of knowledge is not 
strong in order to see the impact of the overhead. 

The gripper problem has been described in Section 2. The blocks world problem is 
well-known. The remaining three benchmark problems are described here. 

Logistics World : There are two types of vehicles, trucks and airplanes. Trucks can be 
used to transport packages within a city, and airplanes to transport packages between 
airports. The problem in this domain starts with a collection of packages at various loca- 
tions in various cities, and the goal is to redistribute these packages to their destinations. 

Ferry World : With actions of boarding and unboarding, the ferry can transport, one car 
at a time, all the cars to their destinations. The sources and destinations of cars may be 
different. 

Elevator Domain: There is an elevator which will transport passengers from their initial 
floors to their goal floors. There are three types of objects: the elevator called lift, a 
number of passengers, and several floors. The elevator can move up and down to any 
floor. When it stops at a floor, any passenger in the elevator can be unboarded, and any 
passenger waiting at the floor can be boarded on to the elevator. The goal of the problem 
is to transport all the passengers to their destination floors. 
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Table 1 . Experiments for the gripper problem with/without domain knowledge (have solution) 



#ball 


steps 


Time (seconds) 






without 


ESK 


SRK 


PK01 


PK02 


PK03 


PK01+02 


SK 


DK 


4 


7 


0.78 


0.13 


0.12 


0.53 


0.18 


0.09 


0.19 


0.06 


0.13 


5 


11 


136.07 


11.02 


5.76 


46.33 


65.40 


23.87 


12.55 


0.88 


199.53 


6 


11 


2611.74 


11.89 


21.23 


440.77 


148.34 


177.41 


23.74 


9.53 


549.63 


7 


15 


> 2 days 


2947 


788.41 


47132 


38435 


11550 


7245 


298.39 


>2 days 



5 Experimental Results 

In the tables that follow 3 , the column labeled “steps” is the length of the plan, the group 
column labeled “time (seconds)” shows the the search time to find the first plan, or 
the answer no when the instance has no solution. Each column under it either shows 
the search time without any domain dependent knowledge (under the column labeled 
“without”), or the search time with a particular kind of domain dependent knowledge. 

Any test run will be stopped after 2 days of running, in which case the corresponding 
entry is filled with “> 2 days”. 

5.1 Experiments with End State Knowledge 

In each table, the experimental results with ESK are shown in the column under the 
label ESK. End state knowledge can be extracted from all transportational problems. 
Once an object gets to its goal location, it should not be moved after that. Also, its goal 
location is not related to the goal locations of other objects. ESK for the blocks world 
problem is not strong, since the positions of the blocks are interrelated. A block reaches 
its own goal position does not mean it will stay at that position. It depends on whether 
all the blocks under it are at their goal positions. 

The performance gain for ESK is impressive. The more difficult an instance is, the 
better ESK works. E.g., in Table 1, when there are 5 balls, the encoding with ESK is 12 
times faster than the one without it. When the number of balls is 6, the encoding with 
ESK is 200 times faster. 

5.2 Experiments with State Relationship Knowledge 

These results appear under the table column labeled SRK. For transportational prob- 
lems, that a moved object is moved back involves the location property at three states. 
This knowledge is used in the experiments of all domains except the block world, for 
which two other experiments are conducted. The column SRK01 in Table 7 employs 
the SRK: do not move a block from table to table, while SRK02 uses the knowledge: do 
not move block X to block Y if Y is not on a good tower. In the latter case, we defined 
a new predicate cjoodTower(X, T) 4 . 

State relationship knowledge is a reliable source for pruning the search space. 

3 For lack of space, we organized the experimental results for one benchmark into one table. 
The reader needs to check out a column from each table when reading the text. 

4 That this particular domain knowledge for the blocks world is very effective for space pruning 
is previously known. 
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5.3 Experiments with Procedural Knowledge 

The blocks world problem has very little sequential patterns in final plans. 

For the gripper domain, we experimented with several types of procedural knowl- 
edge. PKOn below are the column labels in the tables. 

1 . PK01 . A gripper must pick up a ball before putting it down. This is partial PK with 
known state distance. 

2. PK02. If a gripper picks up a ball at T, it must put it down at T + 2, and vice versa. 
This is total PK with known state distance, 

3. PK03. If robot is in room R then its movement from R to another room occurs 
after a pickup or putdown action, and vice versa. This is total PK knowledge with 
unknown state distance. 

We note that between PK01 and PK02, either is strictly stronger than the other. Neither 
rule prunes all the wasted actions that the other could. Applying these two rules together 
therefore works better. The results are given under the column labeled PK01+02. 

The procedural knowledge used for other domains are: 

- PK05 (logistics): Truck loads a package, and then at some later state unloads it. 

- PK06 (elevator): Do not stop at a floor if the lift does not board or unboard any 
passengers at the next state. 

- PK07 (ferry): Unloading a car must occur two states later than boarding the same 
car at a different location. 

- PK08 (ferry): If ferry boards a car at state T, then the ferry should move to its 
destination. 

We obtained performance gains for almost all the test cases that stopped within 2 days. 

5.4 Experiments with Symmetry Knowledge 

Not every planning domain has symmetry. In the blocks world, since parallel actions 
are allowed, there is no symmetry left. Changing the order of action move will totally 
change the plan. Also, any two blocks that have the same properties must be at the 
table and be clear. Then, these two blocks can be moved at the same state. They are not 
symmetric objects. 

We conducted experiments for the gripper problem, the logistics problem, and the 
ferry problem, based on the encoding proposed in Section 3. It turns out that symmetry 
knowledge for the gripper problem is very effective, due to the fact that the presence of 
symmetry is very strong and easy to identify - all the balls that have not been moved 
are all symmetric. 

The use of symmetry knowledge is generally less effective for the logistics problem, 
because the presence of symmetry is not substantial. Our analysis shows that only when 
at a state T, two trucks (or two airplane) have not loaded any packages, and they are at 
the same location, can these two trucks be symmetric objects. This case rarely happens. 
In the instances tested for the logistics domain, pOl, p02 and p04 have little symmetry, 
and as such the overhead makes the performance worse, while p05 and p07 have some 
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Table 2. Experiments for the gripper problem with/without domain knowledge (no solution) 



#ball 


steps 


Time (seconds) 






without 


ESK 


SRK 


PK01 


PK02 


PK01+02 


SK 


DK 


4 


6 


0.28 


0.06 


0.07 


0.21 


0.10 


0.08 


0.05 


0.01 


5 


10 


372.99 


6.07 


9.08 


96.29 


60.32 


18.45 


2.73 


0.09 


6 


10 


937.72 


10.96 


15.63 


178.41 


132.45 


34.91 


3.86 


0.09 


7 


14 


> 2 days 


1350.55 


2990.46 


>2 days 


>2 days 


>2 days 


558.52 


0.16 



symmetries, in one of which the performance gain is minimum while in the other a hard 
instance is solved efficiently. 

For the ferry problem, the effect of SK heavily depends on the problem instance. If, 
at the initial state, many cars are at the same location and they have the same goal loca- 
tion, then these cars are symmetric objects for action board. Otherwise, the symmetry 
is not strong. In Tables 5 and 6, f03 and f03n are designed without symmetries. 

5.5 Experiments with Distance Knowledge 

The effect of utilizing distance knowledge depends on two factors. First, the longer the 
lower bound is. the earlier can the pruning occur and therefore is more effective. The 
second concerns the amount of overhead in computing the lower bound at each state. 

For the gripper domain, the lower bound can be computed by a formula for any 
state, it’s the number of steps to transfer all the balls that are not at their goal room to 
their goal room. If a test case has no solution, it involves no search, and stops right away 
(see Table 2). 

For the ferry domain, due to parallel actions, there is no easy formula for the com- 
putation of the minimum steps to move all the cars to their destinations. We may choose 
the biggest lower bound among all lower bounds for individual cars. In this case, before 
the lower bound becomes effective, the properties of all the objects in the domain have 
to be checked in order to compute it. The effort required to do this is not much less than 
that required to backtrack without checking the lower bound, which is why distance 
knowledge in the ferry domain tends to make search performance worse. Also, in the 
ferry domain, the distance knowledge only works at the last five states (corresponding 
to the case where the car and the ferry are at different locations and the ferry is not 
empty). This is why in most of our test instances, the performance becomes worse. This 
discussion also applies to the elevator problem. 

For the logistics domain, although the distance knowledge works the same way as in 
the ferry domain, the lower bound for some packages can be nine, which is comparable 
to the length of the final plans. The distance knowledge can therefore work at very early 
states which is why distance knowledge works better in the logistics domain than in the 
ferry domain. This discussion also applied to the blocks world domain. 

5.6 Summary of Experimental Results 

Our experiments show that each class of domain dependent knowledge can prune some 
search space, but the extent depends on the planning domain and the type of knowledge. 
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Table 3. The elevator problem with/without domain knowledge (have solution) 



instance 


#passenger 


#floor 




Time (seconds) 










without 


ESK 


SRK 


PK06 


DK 


E02 


3 


5 


EE 


2.14 


0.13 


1.75 


0.75 


2.52 


E03 


4 


7 


■a 


70.01 


2.50 


50.43 


11.48 


95.21 


E04 


5 


9 


El 


853.86 


13.40 


608.79 


115.33 


7151.46 


E05 


6 


11 


El 


3676.65 


112.91 


890.52 


247.11 


>2 days 



Table 4. The logistics world with/without domain knowledge (have solution) 



instance 


#package 


steps 


Time (seconds) 








without 


ESK 


SRK 


PK05 


SK 


DK 


pOl 


6 


9 


2.16 


1.29 


1.27 


2.06 


4.83 


2.33 


p02 


5 


7 


1.88 


2.39 


1.24 


1.63 


12.73 


3.08 


p04 


7 


9 


3.16 


1.31 


2.29 


2.90 


8.00 


1.98 


p05 


4 


11 


70.28 


56.07 


51.71 


60.86 


58.31 


66.31 


p07 


10 


9 


> 2 days 


43.11 


41.26 


>2 days 


320.43 


119.61 



Table 5. The ferry problem with/without domain knowledge (have solution) 



instance 


#car 


#location 


steps 


Time (seconds) 










without 


ESK 


SRK 


PK07 


PK08 


SK 


DK 


fOl 


4 


4 


11 


1.04 


0.15 


0.84 


2.22 


0.32 


0.63 


1.55 


f02 


5 


4 


12 


3.09 


0.56 


2.15 


1.80 


3.25 


1.41 


2.63 


f03 


5 


6 


12 


7.29 


1.93 


12.68 


3.71 


2.94 


10.49 


10.26 


f04 


5 


4 


15 


124.44 


3.08 


40.15 


1.70 


2.66 


45.10 


165.96 


f05 


4 


4 


16 


150.61 


112.50 


61.04 


16.68 


4.67 


8.58 


436.96 



Table 6. The ferry problem with/without domain knowledge (no solution) 



instance 


#car 


#location 


steps 


Time (seconds) 










without 


ESK 


SRK 


PK07 


PK08 


SK 


DK 


fOln 


4 


4 


10 


0.48 


0.10 


0.31 


0.38 


0.47 


0.28 


0.54 


f02n 


5 


4 


11 


2.72 


0.36 


2.15 


0.99 


1.37 


0.92 


2.93 


f03n 


5 


6 


11 


12.78 


1.51 


4.14 


2.93 


3.31 


4.52 


9.76 


f04n 


5 


4 


14 


45.86 


4.86 


30.93 


3.88 


21.38 


8.12 


55.60 


f05n 


4 


4 


15 


122.94 


36.72 


22.35 


11.12 


13.68 


7.43 


136.15 



End state knowledge, state relationship knowledge and procedural knowledge can 
prune bad actions for almost all the planning domains and the improvement can be up 
to 3 orders of magnitude. 

Symmetry knowledge can improve search efficiency, but the overhead is a concern. 
When symmetry is strong in a planning domain, its use improves search efficiency 
significantly. Otherwise, the encoding of the knowledge may increase the search space. 
Therefore, breaking all the symmetries in the problem may not always be an optimal 
strategy; sometimes we may leave some symmetries in the program. 
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Table 7. The blocks world problem with/without domain knowledge (have solution) 



instance 


#blocks 


steps 


Time (seconds) 








without 


ESK 


SRK01 


SRK02 


DK 


bw.16 


16 


8 


3.20 


2.21 


3.08 


1.09 


8.33 


bw.17 


17 


9 


2424.42 


2202.83 


7.78 


1.75 


17.73 


bw.18 


18 


10 


3146.80 


2728.86 


53.79 


2.58 


102.39 


bw.20 


20 


10 


22.70 


10.49 


13.14 


1.56 


102.23 


bw.21 


21 


11 


> 2 days 


>2 days 


24.56 


2.10 


112.47 


bw.25 


25 


15 


> 2 days 


>2 day 


>2 days 


17.80 


>2 days 



Distance knowledge can improve search efficiency, but not for every domain. If the 
lower bound for all the objects in the domain can be computed, it can be used as a 
checker to stop an invalid partial plan at an early stage. If only the lower bound for each 
object in the domain can be computed, the improvement depends on the coverage of the 
distance knowledge. If the distance knowledge only works on the last few states, then 
the overhead may outweigh the benefit. If the distance knowledge works at almost all 
the states, the improvement can be significant, especially for hard instances. 

Combining different classes of domain dependent knowledge can prune more search 
space. For example, we can add end state knowledge and symmetry knowledge to a 
logic program representing a planning problem. Since these two classes of knowledge 
prune different sets of redundant actions, the encoded knowledge can compensate each 
other. Take the gripper problem as an example. The end state knowledge can prune 
redundant actions for the balls that are already in the goal room. If the balls are in the 
goal room, then no actions need to be undertaking for these balls. On the other hand, 
symmetry knowledge prunes bad actions on the balls that are not in the goal room. 
Therefore, combining these two classes of knowledge can prune more search space. 

The use of some knowledge may make other knowledge useless. For example, in the 
same gripper problem. If we apply procedural knowledge to specify that the sequence 
“a gripper first picks up a ball, then moves it to the goal room, puts it down, and moves 
back to the beginning room” is in final plans. At the same time, we also apply end state 
knowledge saying that when a ball a and a gripper g are in the beginning room at any 
state T, action putdown(g , a, T) should not be taken, and if the ball is in the goal room, 
action pickup(g, a, T) should not be taken. Then, the state relationship knowledge “ do 
not move a ball back to a room” becomes useless after applying the above two classes 
of knowledge. 

6 Related Work and Future Direction 

Procedural knowledge has been used in GOLOG, a logic programming language based 
on a situation calculus theory of actions [13,7]. The range of procedural knowledge 
described in this report is far wider than that used in GOLOG. We can capture not only 
sequential information for specific actions, but also the sequence of sets of actions, even 
the sequence of the whole plan can be specified. 

In [20] constraints on sequence of actions are expressed by temporal formulas 
whose interpretation is realized by an answer set program. Our work emphasizes the 
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patterns of procedural knowledge that appear frequently in planning domains. Since we 
do not rely on an interpreter written as part of the planning program, the overhead in 
our method seems to be generally smaller. 

Distance knowledge is first used in CPlan [22], where lower and upper bounds on 
how many steps needed for a variable to change from one value to another are computed 
when applying distance knowledge. In CPlan, these bounds are computed only at the 
initial state. Since, in our encodings, the length is given, the upper bound becomes 
irrelevant. The lower bound is however extended to any state except the goal state. 

A pending investigation is to identify the classes of new atoms in the coding of 
domain dependent knowledge that are non-split , in the sense that their values can be 
determined solely by the atoms in the original planning solution. As such, these atoms 
need not participate in the search, and as a result, the overhead can be reduced. 

Based on the formalization and standard encoding of each class of domain knowl- 
edge, an important area for future research is to design a system for automatically gener- 
ating and translating domain specific knowledge to be added to the encoding of planning 
domains in answer set programming. 
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Abstract. A novel style of multi-agent system specification and deployment is 
described, in which familiar methods from computational logic are re-interpreted 
to a new context. One view of multi-agent system design is that coordination 
is achieved via an interaction model in which participating agents assume roles 
constrained by the social norms of their shared task; the state of the interaction 
reflecting the ways these constraints are mutually satisfied within some system for 
synchronisation that is open and distributed. We show how to harness a process 
calculus; constraint solving; unfolding and meta- variables for this purpose and 
discuss the advantages of these methods over traditional approaches. 



1 Introduction 

We are interested in the specification and deployment of multi-agent systems, which 
we define as systems of distributed components in which components can usefully be 
viewed as autonomous problem solvers that must collaborate in order to perform com- 
plex tasks. There are numerous difficulties in constructing such systems, beyond the 
normal difficulties associated with building individual agents. These specific issues in- 
clude the following: 

- Maintaining the conformity of interaction necessary to perform a shared task reli- 
ably without sacrificing the autonomy of each agent. One solution to this problem 
is to define a model of the interaction with which agents interact (via some appro- 
priate controller) in order to perform a given task. Control and state information 
essential to the task resides in that model, minimising the impact on individual 
agents. 

- Ensuring that when necessary an agent can determine its current role and obliga- 
tions in the interaction, and not requiring that any agent monitor the interaction 
when that is unnecessary. 

- Allowing constraints on variables established locally by agents to be shared by 
other agents if appropriate and for those others to be able to adapt these constraints. 

Although it may seem surprising that standard methods from computational logic 
can be applied simply and directly to these sorts of issues, we shall explain how this 
can be done - in the process establishing a new niche for such methods. To emphasise 
the parallels between issue and method we write each section title in the form I = M, 
where I is an issue for multi-agent system design and M is the corresponding logic 
programming method. The methods taken together provide a basic formal approach to 
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designing and (in appropriate circumstances) deploying multi-agent systems. Section 2 
describes key aspects of earlier research by others that relate to our approach. 

In Section 9 we shall demonstrate how these methods work together on a short but 
(by current standards) complex scenario. The scenario, which concerns ordering and 
configuration of a computer, is as follows: 

An internet-based agent acting on behalf of a customer wants to buy a com- 
puter but doesn’t know how to interact with other agents to achieve this so it 
contacts a service broker. The broker supplies the customer agent with the nec- 
essary interaction information. The customer agent then has a dialogue with the 
given computer vendor in which the various configuration options and pricing 
constraints are reconciled before a purchase finally is made. 

To deal with this scenario we shall introduce the basic components of a Lightweight 
Coordination Calculus (LCC). In Section 3 we introduce the basic calculus. We then, in 
Section 4, show how this permits multi-agent social norms to be controlled by mutual 
constraints on variables determining their message passing behaviour. Sections 5 and 6 
show how traditional transformation methods may be applied to execute LCC specifi- 
cations in distributed environments. Section 7 describes how finite domain constraint 
solving can be used to make protocols less brittle. Section 8 shows how LCC protocols 
are suited to brokering of interactions between agents - a key issue for open systems 
like the Web. 

2 Background 

Although we use the popular term “agent” in our research, an interest in coordination 
of processes in open, distributed environments extends more broadly across computer 
science. Much of the topical interest has come from burgeoning technological efforts - 
in particular the semantic web and multi-agent systems. There has, however, been long 
term interest in the logic programming community. 

In [1] LCC is described from the perspective of those wishing to coordinate seman- 
tic web services, where the point of contact for LCC is the process specification compo- 
nent of (rapidly evolving) service specification languages. Seen from this perspective, 
the closest existing research is from those using temporal logics to specify different 
aspects of required service behaviours: for individual services ( e.g . [2]); shared models 
for coordinating services (e.g. [3]) or the process of composing services (e.g. [4, 5]). In 
[6] LCC is presented as a compact way to describe electronic institutions of the sort re- 
cently made popular in the agent community through use of finite state machine models 
of coordination [7, 8], 

In logic programming there is a history of interest in parallel computation and con- 
sequently an involvement in coordinating the distributed computations in multi-agent 
systems. One form of involvement is to invent a form of logic programming language 
that gives an overall architecture for coordination. The Go! language [9], for example, 
provides a multi-threaded environment in which agents may be coordinated via a shared 
memory store of beliefs, desires and intentions. In contrast to such languages, LCC re- 
quires nothing more than a traditional Prolog system to achieve its form of coordination 
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- the primary interest being in using specifications written in LCC to coordinate pro- 
cesses that may individually be in different environments. Perhaps closer to LCC is 
the work being done on modelling multi-agent coordination using logic programs, for 
example in [10] where the Event Calculus is used to specify and analyse social con- 
straints between agents (the motivation for this being similar to that of [4]). A translator 
has been written from LCC to a version of the Event Calculus, although most of our 
current research on verification of LCC protocols operates more directly from the LCC 
notation [11]. 



3 Interaction Model = Process Calculus 

LCC borrows the notion of role from agent systems that enforce social norms but rein- 
terprets this in a process calculus. Process calculi have been used before to specify 
social norms (see for example [7]) but LCC is, to our knowledge, the first to be used 
directly in computation for multi-agent systems. Social norms in LCC are expressed as 
the message passing behaviours associated with roles. The most basic behaviors are to 
send or receive messages, where sending a message may be conditional on satisfying 
a constraint and receiving a message may imply constraints on the agent accepting it. 
The choice of constraint language depends on the constraint solvers used and we shall 
discuss this more fully in subsequent sections. More complex behaviours are specified 
using the connectives then, or and par for sequence, choice and parallelisation re- 
spectively. A set of such behavioural clauses specifies the message passing behaviour 
expected of a social norm. We refer to this as the interaction framework. Its syntax is as 
shown in Figure 1 . 



Framework 

Clause 

Agent 

Dn 

Message 

C 

Type 

M 



{Clause , . . .} 

Agent :: Dn 
a(Type,Id) 

Agent \ Message \ Dn then Dn \ Dn or Dn \ Dn par Dn \ null <— C 

M =4- Agent \ M =4 Agent <— C \ M 4= Agent \ C <— M 4= Agent 

Term \ C A C \ C V C 

Term 

Term 



Where null denotes an event which does not involve message passing; T erm is a structured term 
in Prolog syntax and Id is either a variable or a unique identifier for the agent. 



Fig. 1 . Syntax of LCC dialogue framework 



LCC is not the first specification language to describe social norms, although it is be- 
lieved to be the first such logic programming language. Conversation policy languages 
(e.g. [12]) are similar to LCC in the sense that they apply constraints to the behaviours 
permitted by agents, thus giving a safe envelope of operation for agents. In Section 7 
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we shall consider this issue in more detail. LCC is also temporal, since it imposes par- 
tial orderings on message passing between agents, so in this respect it resembles efforts 
in the semantic web service arena to represent individual service behaviours ( e.g . [2]); 
shared models for coordinating services (e.g. [3]) or the process of composing services 
(e.g. [4, 5]). A third view of LCC is as a way of describing state change during multi- 
agent interaction. In this aspect it resembles systems like Islander [7, 8] that represent 
social norms as finite state systems in which agents “move” between states according 
to given constraints. In Section 5 we describe a different view of state change but, first, 
we consider briefly the interplay in LCC between social norms and constraints. 

4 Social Norms = Mutual Constraints 

The LCC language ensures coherence of interaction between agents by imposing con- 
straints relating to the messages they send and receive in their chosen roles. The clauses 
of a protocol are arranged so that, although the constraints on each role are independent 
of others, the ensemble of clauses operates to give the desired overall behaviour. For 
instance, the LCC protocol: 

a(rl,Ai) :: offer(X) => a(r2,A2) <— p(X) then accept(X) <= a(r2,A2) 
a(r2,A2) :: offer(X) <= a(rl,A\) then accept(X) => a(rl,Ai) <— q(X) 

( 1 ) 

places two constraints on the variable X: the first (p(2f))is a condition on the agent in 
role rl sending the message offer(X) and the second (g(X)) is a condition on the 
agent in role r2 sending message accept(X) in reply. By (separately) satisfying p(X) 
and q(X) the agents mutually constrain the variable X . 

How does each agent satisfy constraints? LCC allows two options: 

- Internally according to whatever knowledge and reasoning strategies it possesses. 
This is the normal assumption in most multi-agent systems, yet it is not always 
ideal. In particular we sometimes would like to use knowledge specifically for a 
social interaction but not require an agent to internalise it (e.g. if that knowledge 
might be inconsistent with an agent’s own beliefs). In such cases LCC offers a 
second option: 

- Externally using a set of Horn clauses defining common knowledge assumed for 
the purposes of the interaction. Like the LCC protocols themselves, this common 
knowledge is passed between agents along with messages during interaction (see 
Section 6) so it is ephemeral - lasting only as long as the interaction. 

In Section 7 we consider constraint satisfaction in more detail but first we describe 
the basic mechanism provided in LCC for changing the state of the interaction during 
message passing. 

5 Interaction State Change = Unfolding 

In multi-agent systems with predictable behaviours we must be able to reason about 
state change. In Section 6 we shall discuss the distinction between state that is private 
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to individual agents and state associated with their interaction. Before this we describe 
how the state of the interaction from the perspective of an individual agent’s role may 
change. The mechanism for performing this basic operation is a form of unfolding 
familiar from logic program transformation. 

Unfolding of a Horn clause with respect to a set of Horn clauses is done by selecting 
a unit goal in the body of that clause and matching it to the head of a copy of an 
appropriate clause in the set. The body of that matched clause then replaces the unit 
goal in the original clause. State change in LCC uses a similar method. 

Recall that the behaviour of an agent in a given role is determined by the appropriate 
LCC clause. Figure 2 gives a set of rewrite rules that are applied to give an unfolding of 
a LCC clause Ci in terms of protocol V in response to the set of received messages, Mi, 
producing: a new LCC clause C n ; an output message set (),, and remaining unprocessed 
messages M n (a subset of A/,). These are produced by applying the protocol rewrite 
rules above exhaustively to produce the sequence from i to n: 



{Ci 



Mi ,Mi+i ,V,Oi 



c, 



i+ 1 1 



C n - 1 



M n _ 1 ,M n ,V,O n 



Cn 



We refer to the rewritten clause, C n , as an expansion of the original clause, C, and 
write expanded(Ci, M i: V , C n , O n ) when this expansion is performed. In the next sec- 
tion we describe how this basic expansion method is used for multi-agent coordination. 



6 Coordination = Distributed Clauses 

To coordinate an interaction between multiple agents, each agent must know its con- 
straints on when to send and receive messages. We want this to have as low an impact 
as possible on the engineering of agents so the mechanism for achieving this should be 
modular, acting as an intermediary between the agent and the medium used to transmit 
messages. The module we supply has the following elements: 

- A message encoder/decoderfor receiving and transmitting messages from whatever 
message passing media are being used to transport messages between agents. For 
example, if the JADE platform is being used for inter-agent communication then 
the encoder/decoder must be able to read JADE messages (which use the FIPA- 
ACL performative language) and translate these into LCC protocol expressions; 
similarly for other platforms. 

- A protocol expander that decides how to expand a protocol received via a message. 
This was described in Section 5. 

- A constraint solver capable of deciding whether constraints passed to it by the pro- 
tocol expander are satisfiable. This was introduced in Section 4 and is extended in 
Section 7. 

Given the above, expression 2 defines how an agent can react to a received message 
M addressed to its identifier, X, in the role R and carrying protocol, V . S is the store 
of LCC clauses already known to the agent, from which an appropriate clause Ci may 
be drawn if it has already been involved in this role or, if not, Ci may be drawn from 
V. After expansion to C n the clause is replaced in S to give new clause store S n . The 




Multi-agent Coordination as Distributed Logic Programming 



421 



The following ten rules define a single expansion of a clause. Full expansion of a clause is 
achieved through exhaustive application of these rules. Rewrite 1 (below) expands a protocol 
clause with head A and body B by expanding B to give a new body, E. The other nine rewrites 
concern the operators in the clause body. A choice operator is expanded by expanding either side, 
provided the other is not already closed (rewrites 2 and 3). A sequence operator is expanded by 
expanding the first term of the sequence or, if that is closed, expanding the next term (rewrites 4 
and 5). A parallel operator expands on both sides (rewrite 6). A message matching an element 
of the current set of received messages, M, , expands to a closed message if the constraint, C, 
attached to that message is satisfied (rewrite 7). A message sent out expands similarly (rewrite 8). 
A null event can be closed if the constraint associated with it can be satisfied (rewrite 9). An agent 
role can be expanded by finding a clause in the protocol with a head matching that role and body 
B - the role being expanded with that body (rewrite 10). 



A:: B ■ > A:: E 

Ai or A 2 ► E 



. , Mi,M 0 ,V,0 „ 

Ai or A 2 — 5 > E 



a a Mi,M o ,V,0 , 7 . 

A i then A 2 ► E then A 2 

A A Mi,M o ,V,0 . _ 

Ai then A 2 > A\ then E 



. . Mj.Mo.P.OiUOo _ „ 

Ai par A 2 > E 1 par E 2 



if B e 

if ^closed(A 2 ) A 

. Mi,M o ,V,0 „ 
A 1 > hi 



if ^closed(Ai) A 

. MiMoi'PiO 

a 2 . 



E 

M i: Mo,V,0 

if Ai > E 

if closed(Ai) A 
A 2 - > E 

if Ai — 1 L Ei A 

A M n ,Mo,V,0 2 _ 

A2 > rL 2 



^ AiT A <= A},V,Q} , - - A\ -r /AT A\ 

C < — M <= A > c(M <= A) i f (M A) G Mi A 

satis fy(C) 

M =* A Mi ’ M ° ,T>,{M ^ A} > c(M => A) if satis fied(C) 

null <— C 1 c ( nu n ) if satis fied{C) 

a(R, I) C ^ iMo ’ V ’- 9 > a(R, I) :: B if clause(V, a(R, I) :: B ) A 

satis fied(C) 



A protocol term is decided to be closed, meaning that it has been covered by the preceding 
interaction, as follows: 



closed(c(X)) 

closed(A or B ) <— closed(A) V closed(B) 
closed(A then B) <— closed(A) A closed(B) 
closed(A par B ) <— closed(A) A closed(B) 
closed(X :: D) <— closed(D) 

satis fied(C) is true if C can be solved from the agent’s current state of knowledge. 
satis f y(C) is true if the agent’s state of knowledge can be made such that C is satisfied. 
clause(V, X) is true if clause X appears in the dialogue framework of protocol V, as defined in 
Figure 1. 



Fig. 2. Rewrite rules for expansion of a protocol clause 
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resulting set of output messages, O n can then be sent to appropriate agents via whatever 
message encoder is provided. 



react(M,X,R,V,S,O n ,S n ) clause-forjrole(V,a(R,X),S,Ci) A (2) 

expanded(Ci , {M}, V , C n , O n ) A 
replace(Ci, C n , S, S n ) 



clause-forjrole(V,a(R,X),S,a(R,X) :: D) <— 

a(R,X) :: D € S V (3) 

(^(a(R,X) ::D € S) A a(R,X) :: D € V) 

The reactive dehnition above is one of a range of ways that LCC protocols may be 
used in coordinating distributed interactions. To date we have built two other forms of 
deployment mechanism: 

- A Java-based mechanism, implemented by Walton, in which a thread is created for 
each role and these threads then control the message passing. Although this paper 
views LCC from a logic programming perspective, this illustrates that one does not 
necessarily require a Prolog interpreter to compute with it. 

- A Prolog-based mechanism in which clause store, S, is not resident on each agent 
but carried with the protocol as messages are sent. This has the advantage that it 
requires no clause storage on agents but it works only for interactions that are linear, 
in the sense that at any given time only one agent has the protocol. In Section 7 we 
return to this form of interaction when dealing with finite domain restrictions on 
variables in protocols. 

LCC is intended to be as neutral to the choice of mechanism for communicating 
between agents so it is natural (and desirable) that several such mechanisms exist for 
it. It is not neutral, however, to the choice of constraint solver. In the next section we 
discuss in more detail the interplay between constraints and protocol. 

7 Interaction Scope = Constraint Space 

In previous sections we have not discussed in detail the way in which constraints stipu- 
lated in protocols are satisfied. The hooks given for this in Figure 2 (via the predicates 
satis fy(C) and satis fied(C)) tacitly assume a solver that would find a satisfiable 
instance of constraint C. It is, however, well known that satisfying instances of con- 
straints too early can result in the wrong instance being selected. The price paid for this 
in standard logic programs is the computational cost of backtracking. In multi-agent 
interactions we do not have this option because messages sent to other agents remain 
sent and we cannot assume that agents having received messages are able to backtrack, 
since they may not be implemented in a language that supports backtracking. Hence 
if we rely entirely on satisfying instances of constraints our protocols are liable to be 
brittle. 

One remedy for brittleness is to have a more sophisticated view of mutual con- 
straints between agents (recall Section 4) in which we maintain a constraint space that 




Multi-agent Coordination as Distributed Logic Programming 



423 



bounds the scope of the interaction. With a finite domain constraint solver, for instance, 
this constraint space can be described by the range constraints on all the variables in the 
protocol clauses expanded by participating agents. Applying this to our initial example 
of mutual constraints in expression 1, if the range of values permitted for X by p(X ) 
is {1,2,3} while the range of values permitted for X by q(X) is {2, 3, 4} then were 
we to demand instances for variables in all constraints then the agent in role rl might 
(arbitrarily) choose p(l) and consequently send offer( 1) to the agent in role r2. Un- 
fortunately, r2 would then need to satisfy q( 1) in order to reply with an acceptance and 
it cannot. The brittle protocol then, in the absence of a means of backtracking, can only 
fail. Were we instead to use a finite domain solver we would obtain the range {1, 2, 3} 
for p(X) and, since q[X ) is constraining the same variable at the time it is called, this 
would be reduced to {2, 3} - a range that would be attached to the variable returned in 
the accept(X) message. By maintaining a constraint space we make our protocols less 
brittle. 

To maintain a constraint space during interaction between agents we must send, 
along with each message, a description of that space. For a finite domain solver this 
description might be in the form of variable ranges for each variable in the expanded 
protocol clauses. These ranges must then be applied on each expansion step, thus main- 
taining the ranges as constraints change. The simplest way of doing this, given that a 
protocol may be distributed across any number of agents, is to restrict our interactions 
to those which are linear (in the sense given at the end of Section 6) and send along with 
the protocol the clause store, S (this time defining the state of the interaction between 
all participants) and a set, V, containing the current restriction for each variable in the 
clauses of S. Individual agents then react to messages in a similar way to definition 2, 
except in this case the react definition follows a sequence of messages, Mi to M n , be- 
tween (possibly different) agents, X; in role Ri to X :l in role Rj, with each instance of 
react changing both the clause store S and its variable restrictions, V. 

(react(M i ,X i ,R i ,V,Si,Vi,M i+ 1 ,Si + 1 ,Vi + i), ... (4) 

react (M n -i,Xj,Rj,V,S n -i,V n -i,M n , S n , V„)) 

In Section 9 we give a detailed example of this sort of mechanism in operation, 
where the variable restrictions are finite domain restrictions and the constraint solver 
is a finite domain solver. Before reaching this example we cover the final concept we 
consider essential for open agent systems: the ability to broker interactions. 

8 Brokering = Meta- variables 

A broker is a kind of Web service that, upon being asked by a client to suggest a col- 
laboration appropriate for some task, will send that client a description of the dialogue 
with which the client can initiate that collaboration. Brokering is required in open agent 
systems, where agents newly entering an environment may not know the forms of so- 
cial norm expected and may not even know which agents are available. A basic, generic 
definition of brokering is represented succinctly using expressions 5 and 6 below. 

A broker, B, can receive a request for a protocol for a task, T, and will send the 
protocol V if it has it. 
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a(broker, B) :: 

ask(sendjprotocol(T )) 4= a(client(B) , A) then 

inform(protocol(T,V)) => a(client(B) , A) <— pi'otocol_forJ:ask(A,T 1 V) 

(5) 

A client. A, for broker, B, can send a request for a protocol for a task, T; then 
receives the protocol segment V\ then continues its protocol by following V . 

a(client(B) , A) :: 

ask{send_protocol(T)) => a(broker,B) <— haveJtask{T) then 
inform(p7'otocol(T,V)) <= a(broker,B) then 
V 

The generality of this definition of brokering comes from the last step in clause 6 
which allows a protocol sent via a message to be inserted into the protocol followed by 
the recipient of that message. This is similar to the construction of executable goals in 
logic programming languages, but moved to a distributed dialogue setting. 

Notice also that this form of brokering appears to be compatible with forms of bro- 
kering and matchmaking developed elsewhere (for example [13]). The point at which 
matchmaking occurs in the protocol above is when the protocol-f or Aask(A,T 1 V) 
constraint is solved in clause 5, producing a protocol, V, for collaboration to solve task 
T. Different matchmaking algorithms solve this constraint with differing levels of so- 
phistication: 

- “Yellow pages” brokers normally allow only propositional tasks and would return 
as V a protocol only of the form: 

ask(Q) => a(R , S) then inform(A) 4= a(R , S) 

where Q is a query appropriate for the task, T; .4 is the identifier of the agent who 
may answer that query while in role R; and A is the answer obtained in response. 

- Brokers returning linear sequences of agent interactions ( e.g . [14]) generalise yel- 
low pages brokering by offering more than one query-response interaction in per- 
forming a task, so V can in this case be of the form: 

ask(Qi) => a(Ri,S\) then inform{A\) 4= a(Ri,Si) then ... 

... then ask(Q n ) => a(R n ,S n ) then inform(A n ) 4= a(R n ,S n ) 

where n is the number of agents needed to perform the task. More sophisticated 
brokers of this type can generate conditional messages in V to deal with constraints 
such as ontology translation between terms in messages. 

- Brokers exist for assembling more complex structures for V. Those of which we are 
aware assume that process specifications for each individual service are available 
(expressed in a language such as DAML-S) and use a planning system to compose 
these service components into a plan for service invocation. Examples of such sys- 
tems include the planning component of the RETS1NA system [13] and the SHOP2 
system applied to DAML-S [15]. Protocols in our LCC language could be viewed 
as a form of plan that might be constructed using methods analogous to those of 
the RETSINA and SHOP2 experiments. It is not yet clear, however, whether these 
plan-based composition methods can be applied directly to composition of LCC 
protocols. 
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A customer, C , can send a request to vendor, V , to buy an item, X, that the customer needs and 
believes the vendor sells. Then the customer takes the role of negotiator with the vendor. 

a(customer,C) :: 

ask(buy(X)) =» a(vendor,V) <— need(X) A sells(X,V) then (7) 

a(neg .customer (X, V, []), C) 

A negotiating customer with a set, S, of negotiated attributes of the desired item, X , either re- 
ceives an offer of a new attribute. A, and accepts that (continuing in the negotiating role with A 
added to S) or it receives a request to commit to the current set of negotiated attributes and replies 
with the constraints, C a , it wishes to impose on those attributes; then receives confirmation from 
the vendor of the final values, F, for the attributes once the customer's constraints have been 
applied at the vendor’s side. 

a(neg -customer (X, V, S),C) :: 

( offer (A) <= a(neg-vendor(X,C, _),V) then \ 

accept(A) => a(negjvendor(X,C, _),V) <— acceptable(A) then 
a(neg -customer (X,V, [att(A)\S]), C) ) 

or 

( ask(commit) <= a(neg-vendor(X,C,-),V) then \ 

tell(commit(S,C a )) => a(negjvendor(X,C,-),V) *— choose(S,C a ) then 
tell(sold(F)) 4= a(negjuendor(X,C,-),V) ) 

(8) 

A vendor, V, receives a request from a customer, C, to buy an item, X\ then takes the role of 
negotiator with the customer over the attribute set, S, that applies to that item. 

a(vendor,V) :: 

ask(buy(X)) 4= a(customer,C) then (9) 

a(negjvendor(X,C,S),V) <— attributes(X,S) 

A negotiating vendor with a set, S, of negotiable attributes of the desired item, X, either takes 
the first element, A, of S and offers it to the customer for acceptance (continuing then in its 
negotiating role with the remaining attributes, T) or if S is empty it asks the customer to commit 
to the attributes they have discussed and receives the customer's constraints, C a , on the final 
values of those attributes then, if these are satisfiable, it informs the customer of the final attribute 
values, F, for the sold item. 

a(neg-vendor(X,C, S),V) :: 

( offer (A) => a(neg-Customer(X,V, -),C) *— S = [A\T] A available(A) then \ 
accept(A) <= a(neg -customer (X, V, -),C) then 

a (neg -vendor (X,C,T),V) ) 

or 

( ask(commit) a(neg -customer (X, V, -),C) <— S = [] then \ 

tell(commit(F,C a )) <= a(neg -customer (X, V, -),C) then 
tell(sold(F)) =$■ a(neg -customer (X, V, -),C) <— C a ) 

( 10 ) 



Fig. 3. Protocol for our example 
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9 Example Combining Sections 3 to 8 

We now combine the elements introduced earlier to demonstrate how they apply to the 
scenario given at the end of Section 1 . Figure 3 describes the protocol for interaction 
between vendor and supplier. 

The agents involved in the protocol of Figure 3 must be capable of satisfying the 
constraints it imposes. Some of the axioms used in constraint solving might be standard 
for all agents, in which case they are shared among all agents (and propagated with 
the protocol). The definition necessary to choose constraints on attributes is an example 
of this sort of standardisation, since all agents would be assumed to have precisely the 
same interpretation of choose/ 2 , which constructs a conjunctive constraint on attribute 
values from a set of attribute names. Expression 1 1 gives a definition, which assumes 
that a predicate choice/ 2 that determines each constraint is satisfiable by the agent 
asked to choose. 

choose([att(Att)\T],C A R) <— choice(Att,C) A choose(T,R ) 
choose([],tnie) 

As an example of knowledge private to an agent, we now define for the customer 
the ranges of acceptable values for attributes of the personal computer under discussion. 
For instance, the customer would accept disk space of 40 or above. We also define how 
the specific values for attributes are chosen by the customer from the ranges agreed via 
earlier dialogue with the vendor: the maximum from the range being taken for every 
attribute except for price which is minimised. 

need(pc) 
sells(pc, si) 

acceptable(disk^space(D)) <— D in 40. .sup 
acceptable(monitor_size(M )) <— M in 15 ..sup 
acceptable(price(-, P)) <— P in 800. .2000 
ch.oice(diskspace(D ) , true) 
choice(monitor_size(M ) , true ) 
choice(price( P), minimise(P)) 

The vendor agent’s local constraints are defined in a similar way to that of the cus- 
tomer. We define the available ranges for the attributes needed to configure a PC and 
relate these to its price via a simple equation (the aim being to demonstrate the principle 
of relating constraints rather than to have a realistic pricing policy). 

attributes(pc , [diskspace(D), monitor _size{M) , price(D , M, P)]) 
available(diskspace(D)) <— Pin 40. .80 
available(monitor_size(M)) <— M in 15. .18 

availabl e (price(D, M, P)) 4- 1000 + ((M - 15) * 100) + ((D - 40) * 10) #= P 
minimiseiy ) 4 — fdjmin(V,Vm) A V inVm..Vm 

(13) 

The sequence of message passing that follows from the protocol of Figure 3 and the 
constraints of expressions 11,12 and 13 is shown below. The dialogue iterates between 
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a customer, 61, and a vendor, si. Each illocution shows: the type of the agent sending 
the message; the message itself; the type of agent to which the message is sent; and 
the variable restrictions applying to the message (the term r(V, C ) relating a finite do- 
main constraint C to a variable V). The first illocution is the customer making initial 
contact with the vendor. Elocutions two to seven then are offers of ranges for attributes 
( diskspace , monitor size and price ) each of which are accepted by the customer. At 
illocution eight the vendor, which has worked through all its relevant attributes, asks 
for commitment from the customer. In reply, the customer asks the vendor to minimise 
its price (the variable C in illocution nine). Finally, the vendor offers a sale at a price 
of 1000 (it having decided simply to take the minimum of the current range restriction 
on C). The variables for disk-space and monitorsize ( B and A respectively) are 
left without commitment, although the vendor could have committed to these if it so 
desired. 

Sender : a(customer, b 1 ) 

Message : ask(buy(pc)) 

Recipient : a(vendor , si) 

Restrictions : [] 

Sender : a(negjvendor(pc, 61, [diskspace(A) , monitor size(B) , price(A, B, _)]). si) 

Message : offer(diskspace(A)) 

Recipient : a(neg -customer (pc, si, _), 61) 

Restrictions : [r(A, [[40|80]])] 

Sender : a(neg -customer (pc, si, []), 61) 

Message : accept(diskspace(A)) 

Recipient : a(neg-vendor(pc, 61, _), si) 

Restrictions : [r(A, [[40|80]])] 

Sender : a(neg-vendor(pc, 61, [monitor size(A) , price(B , A, _)]), si) 

Message : of fer (monitor size(A)) 

Recipient : a(neg -customer (pc, si, _), 61) 

Restrictions : [r(B, [[40| 80]] ) , r(A, []15| 18]])] 

Sender : a(neg -customer (pc, si, [att(diskspace(A))]) , 61) 

Message : accept (monitor size(B)) 

Recipient : a(neg -vendor (pc, 61, _), si) 

Restrictions : [r(B, ][15|18]]),r(A, ][40|80]])] 

Sender : a(neg -vendor (pc, 61, (price(A, B, C)]), si) 

Message : of fer(price(A, B , C)) 

Recipient : a(neg -customer (pc, si, _), 61) 

Restrictions : [ r(C , [[1000| 1700]]), r(B, [jl5|18]]), r(A, [[40|80]])] 

Sender : a(neg -customer (pc, si, [att(monitor size(A)) , att(diskspace(B))]) , 61) 

Message : accept(price(B , A, C )) 

Recipient : a(neg -vendor (pc, 61, _), si) 

Restrictions : [ r(C , []1000| 1700]]), r(B, [[40|80]]), r(A, [[15| 18]])] 

Sender : a(neg -vendor (pc, 61, []), si) 

Message : ask(commit) 

Recipient : a(neg -customer (pc, si, _), 61) 

Restrictions : [] 



Sender : a(neg_customer(pc, si, [att(price(A, B, C )), att(monitor_size(B)) , att(disk_space(A))]) , 61) 
Message : tell(commit([att(price(A, B, C )), att(monitor_size(B)) , att(disk_space(A ))], minimis e(C))) 
Recipient : a (neg .vendor (pc, 61, _), si) 

Restrictions : [r(C, [[1000|1700]]), r(B, [[15| 18]]), r(A, [[40|80]])] 
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Sender : a(neg-vendor(pc, 61, []), si) 

Message : tell(sold([att(price(A, B, 1000)), att(monitor _size(B )) , att(disk.space(A))])) 
Recipient : a(neg .customer (pc, si, _), 61) 

Restrictions : [i r{B , [[15|18]]),r(A, [[40|80]])] 

Recall that the means of each agent maintaining an appropriate role in the inter- 
action is by expanding the clause it selects for its initial role (see Section5). The term 
below is the fully expanded clause used by agent b 1 in the role of a customer. By fol- 
lowing through the nesting in this clause, the reader may reconstruct the expansions of 
the initial customer clause (clause 7 in Figure 3) performed using the transformations of 
Figure 2 and observe that this allows the message sequence for the customer described 
above. 



a(customer,b 1 ) :: 

c(ask(buy (pc)) =>- a(vendor , si)) then 
(a(neg -customer (pc, si, []), 61) :: 

c(of f er(diskspace(A)) •<= a(neg -vendor (pc, 61, 



c(accept(disk_space(A)) =>• a(neg -vendor (pc, bl, 



diskspace(A) , 
monitor size(B) , 
price(A, B, 1000) J 
diskspace(A) , 
monitor size(B) , 

|_ price(A, B, 1000) 



), si)) then 
), si)) then 



(a (neg -customer (pc, s 1, [att(diskspace(A))]) , 61) :: 

c(o f fer (monitor size(B)) 4= a(neg -vendor (pc, bl, 



c(accept (monitor size(B)) 
(a(neg -customer (pc, si, 



a(neg -vendor (pc, 61 

I), 61 ) = 



monitor size(B) , 
price(A, B, 1000) 
monitor size(B) , 
price(A, B, 1000) 



), si)) then 
), si)) then 



att(monitor size(B)) , 
att(disk_space(A)) 

c(offer(price(A, B, 1000)) ■<= a(neg -vendor (pc, 61, [ price(A , B, 1000)]), si)) then 
c(accept(price(A, B, 1000)) =>- a(neg. vendor (pc, 61, [ price(A , B, 1000)]), si)) then 
att(price(A, B, 1000)), 
att(monitorsize(B)) , ), 61) :: 

|_ att(disk_space(A)) 

(ask(commit) <= a(neg -vendor (pc, bl, []), si)) then 
att(price(A, B, 1000)), 

att (monitor .size(B)), ,minimise( 1000))) 

|_ att(diskspace(A)) 

a(neg -vendor (pc, 61, []), si)) then 
att(price(A, B, 1000)), 



(a(neg-customer(pc, si, 



c(t ell (commit ( 



c(tell(sold( 



att(monitorsize(B)) , 
|_ att(diskspace(A)) 



)) 4= a(neg -vendor (pc, bl, \\) , si)))))) 



Although this example is compact it demonstrates capabilities beyond standard se- 
mantic web service specification languages (such as OWL-S), which do not allow recur- 
sion over data structures and therefore could not represent a recursive negotiation like 
the one in this example. It also allows the protocol of Figure 3 to be brokered to any 
agent asking for it (that brokering interaction also being represented using clauses 5 
and 6 of Section 8) - a capability not possessed by other forms of computation for 
agent social norms. Finally, it demonstrates a simple way of managing finite domain 
constraints across multi-agent interactions. 




Multi-agent Coordination as Distributed Logic Programming 



429 



10 Conclusions 

A criticism of the LCC approach from the mainstream semantic web or agent com- 
munities might be that it too closely resembles logic programming. It is true that the 
language described in this paper uses data structures familiar to logic programmers but 
those that are highly specific to Prolog ( e.g . the list expressions used) are not essen- 
tial to LCC and could be replaced by others according to taste. The essence of LCC 
is its mixture of process calculus and Horn clauses. Both of these aspects do appear 
in the mainstream (though sometimes heavily disguised), for example in the process 
component of OWL-S specifications or in the rule-based reasoners being constructed to 
supplement Description Logic reasoners for semantic web services. The advantage of 
being closer to logic programming than is fashionable in the mainstream is that we are 
able to make our specifications executable through simple, well known methods that 
(because they are well established) we know can be taught to engineers of more tradi- 
tional systems. Many comparable specification languages in the semantic web services 
domain do not possess this advantage. 

The emphasis of this paper is on the way in which we adapt traditional methods 
to this new application. The agents research group at Edinburgh is developing LCC in 
ways we shall describe in other papers: 

- Walton ([11]) has produced a translator from a variant of the language to Promela, 
allowing him to model check protocols using the SPIN model checker. 

- McGinnis ([16, 17]) is exploring how to make interactions more adaptable by al- 
lowing transformations to the protocol by participating agents, leading to notions 
of “safe” adaptations. 

- Barker is applying LCC to the problem of experiment coordination on e-science 
grids, requiring him to reconcile the data-flow paradigm assumed by many grid 
service architectures with the messaging processes of LCC. 

- Guo ([18]) is studying how to translate to LCC from traditional business process 
modelling languages. 

- Hassan ([19]) is developing more sophisticated forms of constraint management 
beyond those described in the current paper. 

Ultimately, our aim is to produce a single form of specification that supports speci- 
fication, analysis and modelling for complex, coordinated, multi-agent systems. 
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Abstract. Current literature offers a number of different approaches to 
what could generally be called “probabilistic logic programming” . These 
are usually based on Horn clauses. Here, we introduce a new formal- 
ism, Logic Programs with Annotated Disjunctions, based on disjunctive 
logic programs. In this formalism, each of the disjuncts in the head of 
a clause is annotated with a probability. Viewing such a set of proba- 
bilistic disjunctive clauses as a probabilistic disjunction of normal logic 
programs allows us to derive a possible world semantics, more precisely, 
a probability distribution on the set of all Herbrand interpretations. We 
demonstrate the strength of this formalism by some examples and com- 
pare it to related work. 

1 Introduction 

The study of the rules which govern human thought has, apart from traditional 
logics, also given rise to logics of probability [10]. As was the case with first 
order logic and logic programming, attempts have been made to derive more 
“practical” formalisms from these probabilistic logics. Research in this field of 
“probabilistic logic programming” has mostly focused on ways in which proba- 
bilistic elements can be added to Horn clause programs. We, however, introduce 
in this work a formalism which is based on disjunctive logic programming [15]. 

This is a natural choice, as disjunctions themselves - and therefore disjunctive 
logic programs - already represent a kind of uncertainty. Indeed, they can, to give 
just one example, be used to model indeterminate effects of actions. Consider 
for instance the following disjunctive clause: 

headsiCoin ) V tails(C oin) <— toss(Coin). 

This clause offers quite an intuitive representation of the fact that tossing a coin 
will result in either heads or tails. Of course, this is not all we know. Indeed, if a 
coin is not biased, we know that it has equal probability of landing on heads or 
tails. In the formalism of Logic Programs with Annotated Disjunctions or LPADs, 
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(Belgium) (F.W.O. - Vlaanderen). 
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this can be expressed by annotating the disjuncts in the head of such a clause 
with a probability, i.e. 

( heads{Coin ) : 0.5) V ( tails(Coin ) : 0.5) toss(Coiri),^biased(Coin). 

Such a clause expresses the fact that for each coin c, precisely one of the following 
clauses will hold: heads{c ) <— toss{c),^biased(c), i.e. the unbiased coin c will 
land on heads when tossed, or tails(c ) <— toss(c),^biased(c), i.e. the unbiased 
coin c will land on tails when tossed. Both these clauses have a probability of 
0.5. 

Such annotated disjunctive clauses can be combined to model more compli- 
cated situations. Consider for instance the following LPAD: 

( heads{Coin ) : 0.5) V ( tails(Coin ) : 0.5) toss(Coiri),^biased(Coin). 

(heads(Coin) : 0.6) V ( tails(Coin ) : 0.4) toss(Coin),biased(Coin). 

(/ air (coin) : 0.9) V ( biasedicoin ) : 0.1). 

(toss(coin) : 1). 

Similarly to the first clause, the second clause of the program expresses that 
a biased coin lands on heads with probability 0.6 and on tails with probability 
0.4. The third clause says that a certain coin, coin, has a probability of 0.9 of 
being fair and a probability of 0.1 of being biased; the fourth clause says that 
coin is certainly (with probability 1) tossed. 

As mentioned previously, each ground instantiation of an annotated dis- 
junctive clause represents a probabilistic choice between several non-disjunctive 
clauses. Similarly, each ground instantiation of an LPAD represents a proba- 
bilistic choice between several non-disjunctive logic programs, which are called 
instances of the LPAD. This intuition can be used to define a probability distri- 
bution on the set of Herbrand interpretations of an LPAD: the probability of a 
certain interpretation / is the probability of all instances for which / is a model. 
As in [11], this probability distribution defines the semantics of a program. 

These notions will be formalized in Section 2, where we describe the syntax 
and semantics of LPADs. We illustrate our formalism further by presenting some 
examples in Section 3: it is shown how a Hidden Markov Model and Bayesian 
network can be represented by an LPAD, and how LPADs can represent actions 
with uncertain effects in a situation calculus setting. In Section 4 we give an 
overview of, and compare our work with, existing formalisms for probabilistic 
logic programming. It is shown that, while the semantics of LPADs is similar to 
that of some existing approaches, they do offer significant advantages by provid- 
ing a natural way of representing relational probabilistic knowledge and as such 
constitute a useful contribution to the held of probabilistic logic programming. 
We conclude and discuss future work in Section 5. An extended version of this 
paper is given in [26]. 

2 Logic Programs with Annotated Disjunctions 

A Logic Program with Annotated Disjunctions consists of a set of rules of the 
following form: 
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{h\ . cti) V • • • V {h n . ct n ) * b \, . . . , b m . (1) 

Here, the hi and bi are, respectively, atoms and literals of some language and 
the cq are real numbers in the interval [0, 1], such that Xu=i a i — 1- For a rule 
r of this form, the set {{hi : on) | 1 < i < n} will be denoted as head(r), while 
body{r) = {bi | 1 < i < m}. If head{r) contains only one element (a : 1), we will 
simply write this element as a. 

We will denote the set of all ground LPADs as Vg. 

The semantics of an LPAD is defined using its grounding. For the remainder 
of this section, we therefore restrict our attention to ground LPADs. Further- 
more, in providing a formal semantics for such a program P Vg, we will, in 
keeping with logic programming tradition [14], also restrict our attention to its 
Herbrand base Hb{P) and consequently to the set of all its Herbrand interpre- 
tations Ip = 2 Hb ^ p \ In keeping with [11], the semantics of an LPAD will be 
defined by a probability distribution on Ip: 

Definition 1. Let P be in Vg. An admissible probability distribution n on Ip 
is a mapping from Ip to real numbers in [0, 1], such that Y^ieip 7r (^) = 1- 

We would now like to select one of these admissible probability distributions 
as our intended semantics. To illustrate this process, we consider the grounding 
of the example presented in the introduction: 

(heads(coin) : 0.5) V {tails{coin) : 0.5) <—toss{coin),^biased{coin). 

( heads{coin ) : 0.6) V {tails{coin) : 0.4) <—toss{coin),biased{coin). 

( fair{coin ) : 0.9) V {biased{coin) : 0.1). 

toss{coin). 

As already mentioned in the introduction, each of these ground clauses represents 
a probabilistic choice between a number of non-disjunctive clauses. By choosing 
one of the possibilities for each clause, we get a non-disjunctive logic program, 
for instance: 



heads{coin) <— toss{coin) , ^biased(coin) . 
heads{coin ) <— toss{coin),biased{coin). 
fair {coin). 
toss{coin). 

Such a program is called an instance of the LPAD. Note that this LPAD has 
2 • 2 • 2 = 8 different instances. Such an instance can be assigned a probability 
by assuming independence between the different choices. This is a reasonable 
assumption to make, because - as in classical logic programming - it should be 
possible to read each clause independently from the others; dependence should 
be modeled within one clause. Indeed, in our example it makes perfect sense to 
assume that the probability of a non-biased coin landing on heads is independent 
of the probability of a biased coin landing on heads and of the probability of 
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a certain coin being fair. As such, the probability of the above instance of the 
example is 0.5 • 0.6 • 0.9 • 1 = 0.27. 

We now formalize the above ideas. The notion of a selection function formal- 
izes the idea of choosing, for each ground instantiation of a rule in an LPAD, 
one of the atoms in its head. 

Definition 2. Let P be a program in Vg. A selection a is a function which 
selects one pair (. h : a) from each rule of P, i.e. a : P — > ( Hb(P ) x [0,1]) such 
that for each r in P, a(r) £ head{r). For each rule r, we denote the atom h 
selected from this rule by er atom (r) and the selected probability a by a pro b(r). 
Furthermore, we denote the set of all selections a by Sp. 

Each selection a defines an instance of the LPAD. 

Definition 3. Let P be a program in Vg and a a selection in Sp. The instance 
Pa chosen by a is obtained by keeping only the atom selected for r in the head 
of each rule r £ P, i.e. P a = {“cr a t om (r) <— body(r) ” \ r £ P}. 

The process of defining the semantics of an LPAD through its instances, is 
similar to how so-called split programs are used in [21] to define the possible model 
semantics for (non-probabilistic) disjunctive logic programs. The main difference 
is that a split program is allowed to contain more than one non-disjunctive 
clause for each original disjunctive clause, as the possible model semantics aims 
to capture both the exclusive and inclusive interpretations of disjunction. In 
contrast, a probabilistic rule in an LPAD expresses the fact that exactly one atom 
in the head holds (with a certain probability) as a consequence of the body of 
the rule being true. Of course, in such a semantics, the inclusive interpretation of 
disjunctions can be simulated by adding additional atoms to explicitly represent 
the conjunction of two or more of the original disjuncts. It is worth noting 
that the semantics of the preferential reasoning formalism Logic Programs with 
Ordered Disjunctions [5], is also defined using a similar notion of instances 1 . 

Next, we assign a probability to each selection a in Sp, which induces a 
probability on the corresponding program P a . As motivated above, we assume 
independence between the selections made for each rule. 

Definition 4. Let P be a program in Vg. The probability of a selection a in Sp 
is the product of the probabilities of the individual choices made by that selection, 

i.e. 

Ca — | O'prob^r') . 

r£P 

The instances of an LPAD are normal logic programs. The meaning of such 
programs is given by their models under a certain formal semantics. For ex- 
ample, all common semantics for logic programs agree that the meaning of the 
above instance of the coin-program, is given by the Herbrand interpretation 

1 We would like to thank an anonymous reviewer of a previous draft for pointing this 
out to us. 
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{toss(coin), fair (coin), heads(coin)}. The instances of an LPAD therefore de- 
fine a probability distribution on the set of interpretations of the program. More 
precisely, the probability of a certain interpretation / is the probability of all 
instances for which / is a model. 

Returning to the example, there is one other instance of this LPAD which 
has {toss(coin), fair (coin), heads(coin)} as its model, namely 

heads(coin) <—toss(coin), -^biased(coin) . 
tails(coin ) <—toss(coin), biased(coin) . 
fair (coin), 
toss (coin) . 

The probability of this instance is 0.5-0. 4-0. 9T = 0.18. Therefore the probability 
of the interpretation {toss(coin), fair (coin), heads(coin)} is 0.5 • 0.4 • 0.9 • 1 + 
0.5 • 0.6 • 0.9 • 1 = 0.5 • (0.4 + 0.6) • 0.9 • 1 = 0.45. 

Of course, there are a number of ways in which the semantics of a non- 
disjunctive logic program can be defined. In our framework, uncertainty is mod- 
eled by annotated disjunctions. Therefore, a non-disjunctive program should 
contain no uncertainty, i.e. it should have a single two-valued model. Indeed, 
this is the only way in which an LPAD can be seen as specifying a unique proba- 
bility distribution, without assuming that the “user” meant to say something he 
did not actually write. Consider for instance the program: {a <— —ib. b <— ->a.}. 
Any reasonable probability distribution specified by this program, would have 
to assign a probability a to the interpretation {a} and 1 — a to {b}. However, 
if such a probability distribution were intended, one would simply have written: 
(a : a) V (b : 1 — a). 

Therefore, we will take the meaning of an instance P a of an LPAD to be 
given by its well founded model WFM(P a ) [24] and require that all these well 
founded models are two-valued. If, for instance, the LPAD is acyclic (meaning 
that all its instances are acyclic [1]), this will always be the case. 

Definition 5. An LPAD P is called sound iff for each selection a in Sp, the 
well founded model of the program P a chosen by o is two-valued. 

The probabilities on the elements a of Sp are then naturally extended to 
probabilities on interpretations. The following distribution ir P gives the seman- 
tics of an LPAD P. 

Definition 6. Let P be a sound LPAD in Vg. For each of its interpretations I 
in Ip, the probability n r P (I) assigned by P to I is the sum of the probabilities 
of all selections which lead to I, i.e. with S(I) being the set of all selections o 
for which WFM(P a ) = I: 

7T* p (I)= E 

<r eS(J) 

It is easy to show that - for a sound LPAD - this distribution n P is indeed 
an admissible probability distribution [26]. 
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Proposition 1. Let P be a sound LPAD in Vg. Then itp is an admissible prob- 
ability distribution. 

There is a strong connection between the interpretations I £ Tp for which 
7 Tp(I) > 0 and traditional semantics for disjunctive logic programs. First of all, 
each such interpretation I with 7 t* p (I) > 0 is also a possible model [21] of the 
LPAD (when ignoring the probabilities, of course). Secondly, each stable model [8] 
of the LPAD is such an interpretation. Moreover, in most cases, the stable model 
semantics coincides precisely with this set of interpretations. Only for programs 
where the same atom appears in the head of different clauses, as for instance 
{(a : 0.5) V ( b : 0.5). a.}, can there be a difference. Indeed, this example has a 
unique stable model {a}, but in our probabilistic framework np({a,b}) = 0.5. 
From a modeling perspective, this difference makes sense, because in an LPAD a 
clause like the first one represents a kind of “experiment” , of which the disjuncts 
in its heads are possible outcomes. As such, there is no reason why a being true 
should preclude b as a possible outcome of the experiment denoted by the first 
clause. 

Of course, we are not only interested in the probabilities of interpretations, 
but also in the probability of a formula f> under the semantics 7 t* p . This is defined 
as the sum of the probabilities of the interpretations in which the formula is true: 

Definition 7. Let P be a sound LPAD in Vg. Slightly abusing notation, for 
each formula <j>, the probability 7rJ>(<^) of <f> according to P is the sum of the 
probabilities of all interpretations in which <p holds, i.e. 

<pO) = 7r p( i ')- 

iei% 



with Tp = {I £ Tp | I |= </>}. 

Calculating such probabilities is the basic inference task of probabilistic logic 
programs. Usually, the formulas are restricted to being queries , i.e. existentially 
quantified conjunctions. While inference algorithms are not the focus of this 
work, we will nevertheless briefly explain how this inference task is related to 
inference for logic programs. 

The probability of a formula is defined as the sum of the probabilities of 
all interpretations in which it is true. The probability of such an interpretation 
is, in turn, defined in terms of the probabilities of the normal logic programs 
which can be constructed from the LPAD. Hence, finding a proof for the query, 
gives us already “part” of the probability of the query. To compute the entire 
probability of the query, it suffices to find all proofs and to appropriately combine 
the probabilities associated with the heads of the clauses appearing in these 
proofs. 

In Section 4 on related work, we will discuss a formalism called the Inde- 
pendent Choice Logic [20]. For this formalism, an inference algorithm has been 
developed, which operates according to the principles outlined in the previous 
paragraph. Furthermore, a source-to-source transformation from acyclic LPADs 
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to ICL exists, which allows this algorithm to also be applied to acyclic LPADs. 
Moreover, as this transformation is polynomial in the size of the input program, 
this shows that acyclic LPADs are in the same complexity class as ICL. 

In [26], we show that the semantics presented in this section is consistent 
with that proposed in Halpern’s fundamental article [11], in which a general 
way of formalizing a certain type of probabilistic knowledge through a possible 
world semantics was introduced. 

3 Examples 

We present four examples and refer to [26] for more examples. 

A Hidden Markov Model. The Hidden Markov Model in Figure 1 can be modeled 
by the following LPAD. 



(stafe(sO, s(T)) : 0.7) V (state(sl, s(T )) 
(state(sl, s(T)) : 0.8) V (state(s2, s(T)) 

state(s2 , 

( out(a,T ) : 0.2) V ( out(b,T ) 
( out(b,T ) : 0.9) V ( out(c,T ) 
( out(b,T ) : 0.3) V ( out(c,T ) 



: 0.3) 4 


— state(sO, T) 


: 0.2) 4 


—state(sl, T ) 


s(T)) 4 


—state(s2, T) 


: 0.8) 4 


—state(sO, T) 


: 0.1) 4 


— state(sl, T) 


: 0.7) 4 


—state(s2, T) 
state(sO, 0). 



This program corresponds 
nicely to the way in which one 
would tend to explain the se- 
mantics of this HMM in natu- 
ral language. For instance, the 
first clause could be read as: “if 
the HMM is in state so, then it 
can either go to state Si or stay 
in state sq. v 

It is worth noting that this 
LPAD has an infinite grounding. As such, each particular instance of this LPAD 
has a probability of zero. However, our semantics remains well-defined and still 
assigns an appropriate non-zero probability to each finite string, through an infi- 
nite sum of such zero probabilities. Moreover, the aforementioned transformation 
from LPADs to ICL is also able to deal with such programs, since it does not 
instantiate any variables. As the inference-algorithm of ICL does not need to 
compute the grounding of a program, but rather searches for “proofs” of a query 
in an SLD-like manner, this allows these probabilities to be effectively computed 
as well. 




Fig. 1 . A Hidden Markov Model. 
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Fig. 2. A Bayesian network. 



A Bayesian network. The Bayesian network in Figure 2 can also be represented 
in our formalism. This is done by explicitly enumerating the possible values for 
each node. In this way, every Bayesian network can be represented as an LPAD. 

(■ burg(X,t ) : 0.1) V ( burg(X,f ) : 0.9). 

( earthq(X,t ) : 0.2) V (earthq(X,f) : 0.8). 
alarm(X,t) <— burg(X,t),earth.q(X,t). 

( alarm(X,t ) : 0.8) V (alarm(X, f) : 0.2) <— burg(X,t),earthq(X, /). 

( alarm(X,t ) : 0.8) V (alarm(X, f) : 0.2) <— burg(X, /), earthq(X,t)- 
( alarm(X,t ) : 0.1) V (alarm(X, f) : 0.9) <— burg(X, f),earthq(X, /). 

Actually, this LPAD represents several “versions” of the original Bayesian 
network, namely one for each instantiation of X. As such, this representation is 
similar to the knowledge based model construction- formalism of Bayesian Logic 
Programs [12], a first-order extension of Bayesian networks, which is discussed 
in Section 4. 

Throwing dice. There are some board games which require a player to roll a six 
(using a standard die) before he is allowed to actually start the game itself. The 
following example shows an LPAD which defines a probability distribution on 
how long it could take a player to do this. 

(on(D, 1, s(T)) : 1/6) V (on(D, 2, s(T)) : 1/6) V • • • V (on(D, 6, s(T)) : 1/6) 

<— time(T), die(D), ~>on(D , 6, T). 

start -game(s(T)) <— time{T) 1 on(D, 6, T). 
time(s(T)) <— time{T). 
time{ 0). die(die). 

The first rule of this LPAD is the most important one. It states that if the 
player has not succeeded in getting a six on his current attempt, he will have 
to try again. Note that, because of the use of negation-as-failure in the body of 
this clause, the atoms on(D , 1, s(T)), . . . , on{D , 5, s(T)) are only needed to serve 
as alternatives for on(D, 6, s(T)). As such, in the context of this example, this 
clause could equivalently be written as for instance: 
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(on(D, 6, s(T)) : 1/6) V (not six : 5/6) time(T), die(D), ->on(D, 6, T). 

Moreover, instead of the atom notsix , any atom not appearing in the rest of 
the program could be used. We therefore simply abbreviate such clauses by 

(on(D, 6, s(T)) : 1/6) <— time(T), die(D), ->on(D, 6, T). 

Situation calculus. LPADs can also be used to model indeterminate actions. To 
demonstrate this, we will consider a variant of the turkey shooting problem. If 
we try to shoot a healthy turkey, there are three possible outcomes: we can miss 
(with probability 0.2), we can hit but merely wound it (with probability 0.5) or 
we can immediately succeed in killing it (with probability 0.3). Trying to shoot 
a wounded turkey can have two possible effects: we can miss, in which case the 
turkey simply remains wounded (with probability 0.3 - wounded turkeys are of 
course less mobile than their healthy counterparts), or we can hit and kill it 
(with probability 0.7). If the turkey is already wounded and we wish to save our 
bullets, we can also simply wait to see whether it will succumb to its wounds 
(with probability 0.4). However, this also gives the turkey a chance to rest and 
as such it could recover from its injuries (with probability 0.1). 

The following LPAD models this problem, using the situation calculus. 

(holds (healthy , do(shoot, S)) : 0.2) V holds(wounded, do(shoot, S)) : 0.5) 

V holds(dead, do(shoot, S)) : 0.3) <— holds (healthy, S). 

(holds(wounded, do(shoot, S)) : 0.3) V (holds(dead, do(shoot, S)) : 0.7) 

<— holds(wounded , S). 

(holds (healthy, do(wait, S)) : 0.1) V (holds(wounded, do(wait, S )) : 0.5) 

V holds(dead, do(wait, S)) : 0.4) <— holds(wounded, S). 
holds(dead,do(A, S)) holds(dead, S ) , action(A) . 

holds(Prop, do(wait, S)) <— holds(Prop, S ), ~^holds(wounded, S). 
holds(healthy, sO). action(wait) . action(shoot) . 

The probability distribution defined by this LPAD specifies, for instance, that 
the probability 7r *(holds(dead, do(wait, do(shoot, s0)))) of the turkey being dead 
after shooting and waiting, is 0.3 + 0.5 • 0.4 = 0.5. Once again, this probability 
can also be effectively computed by applying the transformation from LPADs to 
ICL. 

4 Related Work 

There is a large body of work concerning probabilistic logic programming. Due 
to space limitations, we refer to [25] and [26] for more details, references, and 
comparisons with LPADs. 

An important class of probabilistic logic programming formalisms are those 
following the Knowledge Based Model Construction or KBMC approach. Such 
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formalisms allow the representation of an entire “class” of propositional models, 
from which, for a specific query, an appropriate model can then be constructed 
“at run-time”. This approach was initiated by Breese et al [4] and Bacchus 
[2]. Examples are: Context-Sensitive Probabilistic Knowledge Bases of Ngo and 
Haddawy [19], Probabilistic Relational Models of Getoor et al [9], and Bayesian 
Logic Programs of Kersting and De Raedt [12] . 

A formal comparison between LPADs and Bayesian Logic Programs (BLPs) 
is given in [26]. A BLP can be seen as representing a (possibly infinite) set of 
Bayesian networks. Each ground atom represents a random variable, which can 
take on a value from a domain associated with its predicate. An implication in 
a clause of a BLP is not a logical implication, but rather an expression concern- 
ing probabilistic dependencies. This makes the reading of a BLP - at least for 
those acquainted with logic programming - less natural. Another difference is 
that, although it is possible to simulate classical negation in BLPs, they do not 
incorporate non-monotonic negation. In some cases, this can lead to longer and 
less intuitive programs. 

The most natural way of modelling the coin-example of the introduction, is 
by the following BLP: 

side(Coin ) <— toss(Coin),biased(Coin). 
biased(coin) . 
toss(coin). 

With each of these clauses, a conditional probability table or CPT has to be 
associated, which defines the conditional probability of each value for a random 
variable associated with a ground instantiation of the atom in the head of the 
clause, given the values of the random variables associated with the correspond- 
ing ground instantiations of the atoms in the body. In the case of the example, 
these are as follows. The atoms toss (Coin ) and biased(Coin ) are abbreviated 
to, respectively, t(C) and b(C) . 



side(C) 


t(C) 


= t, b(C) = t 


t(C) = t, b(C) = f 


t(C) = f, b(C) = t 


t(C) = f, b(C) = f 


heads 




0.6 




0 


0 


tails 




0.4 


0.5 


0 


0 


NA 






0 


0 


1 


1 


biased(coin) 






toss (coin) 








t 




0.6 




t 


i 






t 




0.4 




t 


0 







Note that, in order to simulate the fact that we are only interested in the posi- 
tion of coins after they have been tossed, the domains of the random variables 
corresponding to ground instantiations of side(C) had to be extended with the 
“don’t care” -value N A (not applicable). 

In [26] it is formally shown that the semantics of a BLP can be expressed by 
an LPAD, which explicitizes the implicit argument of each atom, i.e. its “value”, 
and enumerates all the elements in the domain. This process is similar to that 
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which was used in Section 3 to model a Bayesian network by an LPAD. Con- 
versely, it is shown that quite a large subset of all LPADs can be represented as 
BLP. This is done by making each atom a boolean random variable, i.e. one with 
domain {true, false}, simulating the logical semantics of clauses by appropriate 
conditional probabilities and introducing for each clause a new random variable 
to explicitely represent the choice between the disjuncts in its head. This pro- 
cess, however, often leads to BLPs which are not very natural. For instance, the 
LPAD-clause 

( heads{Coin ) : 0.5) V ( tails(Coin ) : 0.5) <— toss(Coiri),^biased(Coiri). 
would be transformed to the set of clauses 

heads(Coin ) <— toss{Coin),notJoiased(Coin),ch(Coin). 
tails(Coin) <— toss{Coin),notMased(Coin),ch(Coin). 
ch.(Coin). 

with the following CPTs: 



ch(C) 




1 


0.5 


2 


0.5 



heads(C) 


t(C) =t,nb(C) =t,ch(C) = 1 




t 


1 


0 


f 


0 


1 




tails(C) 


t(C) = t, nb(C) = t,ch(C) = 2 




t 


1 


0 


f 


0 


1 



Another class of formalisms, besides that of KBMC, are those which grew 
out of an attempt to extend logic programming with probability. Among these 
formalisms, Programming in Statistical Modeling (PRISM) [23] and the Indepen- 
dent Choice Logic (ICL) [20] deviate the least from classical logic programming. 
ICL is a probabilistic extension of abductive logic programming. An ICL pro- 
gram consists of both a logical and a probabilistic part. The logical part is an 
acyclic, normal logic program. The probabilistic part consists of a set of clauses 
of the form (in LPAD syntax): (cq : oq) V • • • V ( a n : a n ). The atoms at in such 
clauses are called abclucibles. Each abclucible may only appear once in the prob- 
abilistic part of an ICL program; in the logical part of the program, abclucibles 
may only appear in the bodies of clauses. 

Syntactically, each ICL program is clearly an LPAD. In [26] it was shown 
that this embedding of ICL into LPADs preserves the original semantics of ICL 
(as formulated in [20]). Conversely, each acyclic LPAD can be transformed into 
one in this restricted syntax [26]. This is done by creating new, artificial atoms, 
which explicitly represent the process of choosing a disjunct from the head of a 
clause, as is illustrated by the following ICL- version of the coin-example of the 
introduction: 
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heads(Coin ) <—toss(Coin), ~^biased(C oin ) , / air Jieads (Coin). 

tails(Coin) <—toss(Coin ), ~^biased(C oin ) , f air Jails(C oin) . 
heads(C oin) <—toss(Coin ), biased(Coin ), biased Jieads(C oin). 
tails(Coin) <—toss(Coin ), biasediC oin) , biased Jails(C oin) . 

(fair Jieads(C oin) : 0.5) V (fair Jails (Coin) : 0.5). 

(biased Jieads (C oin) : 0.6) V (biased Jails(Coin) : 0.4). 

(fair(coin) : 0.9) V (biased(coin) : 0.1). 

toss (coin). 

On the first author’s web site 2 a Prolog program can be found which performs 
this transformation. As such, even though LPADs do not (yet) have an imple- 
mented inference algorithm of their own, it is already possible to solve queries 
to acyclic LPADs by using the ICL algorithm. 

It should be noted that, although these two formalisms are similar in terms of 
theoretical expressive power, they are nevertheless quite different in their prac- 
tical modeling properties. Indeed, ICL (and of course the corresponding subset 
of LPADs) is ideally suited for problem domains such as diagnosis or theory re- 
vision, in which it is most natural to express uncertainty on the causes of certain 
effects. The greater expressiveness of LPADs (in the sense that LPADs allow 
more natural representations of certain types of knowledge), on the other hand, 
makes these also suited for problems such as modeling indeterminate actions, in 
which it is most natural to express uncertainty on the effects of certain causes. 
Of course, this is not surprising, as a similar relationship exists between the non- 
probabilistic formalisms on which ICL and LPADs are based: [22] proves that 
abcluctive logic programming and disjunctive logic programming are essentially 
equivalent; however, history has shown that both these formalisms are valid 
ways of representing knowledge, with each having problem domains for which it 
is better suited than the other. 

LPADs are not the only probabilistic formalism based on disjunctive logic 
programming. In Many- Valued Disjunctive Logic Programs of Lukasiewicz [16] 
probabilities are associated with disjunctive clauses as a whole. In this way, 
uncertainty of the implication itself - and not , as is the case with LPADs, of the 
disjuncts in the head - is expressed. In our work, the goal is not to represent 
uncertainty about the truth of a disjunctive clause. In fact, given an LPAD and 
a model of it, all the corresponding non-probabilistic disjunctive rules are true 
in the interpretations which have non-zero probability. Instead, LPADs are to 
be used in situations where one has uncertainty about the consequence (head) 
of a given conjunction of atoms (body). 

All the works mentioned above use point probabilities. There are however 
also a number of formalisms using probability intervals: Probabilistic Logic Pro- 
grams of Ng and Subrahmanian [18], their extension to Hybrid Probabilistic Pro- 
grams of Deklrtyar and Subrahmanian [7] and Probabilistic Deductive Databases 
of Lakslrmanan and Sadri [13]. Contrary to our approach, programs in these 

http://www.cs.kuleuven.ac.be/~joost 
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formalisms do not define a single probability distribution, but rather a set of 
possible probability distributions, which - in a sense - allows one to express 
a kind of “meta-uncertainty”, i.e. uncertainty about which distribution is the 
“right” one. Moreover, the techniques used by these formalisms tend to have 
more in common with constraint logic programming than “normal” logic pro- 
gramming. 

We also want to mention the Stochastic Logic Programs of Muggleton and 
Cussens [6, 17]. In this formalism probabilities are attached to the selection of 
clauses in the Prolog inference algorithm, which basically results in a first-order 
version of stochastic context free grammars. Because of this formalism’s strong 
ties to Prolog, it appears to be quite different from LPADs and indeed all of the 
other formalisms mentioned here. 

More recently, Baral et al. developed P-log [3]. While the principles underly- 
ing this formalism and its semantics seem quite related to ours, there are quite 
some differences in the execution of these ideas; more specifically, P-log is not an 
extension of logic programming, but rather a new language which “compiles to” 
answer set programs. As such, the precise relation between LPADs and P-log is 
currently not clear. 

5 Conclusion and Future Work 

In Section 2, Logic Programs with Annotated Disjunctions were introduced. In 
our opinion, this formalism offers a natural and consistent way of describing 
complex probabilistic knowledge in terms of a number of (independent) simple 
choices, an idea which is prevalent in for instance [20]. Furthermore, it does not 
ignore the crucial concept of conditional probability, which underlies the entire 
“Bayesian movement”, and does not deviate from the well established and well 
known non-probabilistic semantics of first-order logic and logic programming. 
Indeed, as shown in Section 2, for an LPAD P, the set of interpretations I for 
which 7 Tp(I) > 0, is a subset of the possible models of P and a (small) superset 
of its stable models. 

While the comparison with related work such as ICL (Section 4) showed that 
the ideas underlying this formalism and its semantics are not radically new, we 
feel it offers enough additional advantages in providing a natural representation 
of relational probabilistic knowledge, to constitute a useful contribution to the 
field of probabilistic logic programming. In future work, we hope to demonstrate 
this further, by presenting larger, real-world applications of LPADs. We also 
plan further research concerning a proof procedure and complexity analysis for 
LPADs. 

Finally, there are a number of possible extensions to the LPAD formalism 
which should be investigated. For example, it might prove useful to allow the use 
of variables in the probabilistic annotations and incorporate aggregates, in order 
to allow a more concise representation of certain basic probability distributions. 
In such a way, one would be able to express, for instance, that if one chooses 
a person at random from a room in which there are m men and / women, the 
probability of having chosen a man is - ™ , : 
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(■ male(C ) : — ) V ( female(C ) : — ) <— chosen(C), 

M = count(X, male(X)), F = count(X, female(X)), P = M + F. 

Because of the logical nature of LPADs and their instance-based semantics, it 
should be fairly straightforward to add such extensions to the language in a 
natural way. In other formalisms, such as BLP or ICL, this appears to be more 
difficult. 
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Current approaches to mobile code safety - inspired by the technique of Proof- 
Carrying Code (PCC) [4] - associate safety information (in the form of a cer- 
tificate) to programs. The certificate (or proof) is created by the code supplier 
at compile time, and packaged along with the untrusted code. The consumer 
who receives the code+certificate package can then run a checker which, by a 
straightforward inspection of the code and the certificate, is able to verify the 
validity of the certificate and thus compliance with the safety policy. The main 
practical difficulty of PCC techniques is in generating safety certificates which 
at the same time: i) allow expressing interesting safety properties, ii) can be 
generated automatically and, iii) are easy and efficient to check. 

We propose an automatic approach to PCC which makes use of abstract in- 
terpretation [2] techniques for dealing with the above issues. While our approach 
is general, we develop it for concreteness in the context of (Constraint) Logic 
Programming, (C)LP, because this paradigm offers a good number of advan- 
tages, especially the maturity and sophistication of the analysis tools available. 
Assertions are used to define the safety policy. Such assertions are syntactic ob- 
jects which allow expressing “abstract” - i.e. symbolic - properties over different 
abstract domains. The first step in our method then involves automatically in- 
ferring a set of safety assertions (corresponding to the analysis results), using 
abstract interpretation, and taking as a starting input the program, the pre- 
defined assertions available for library predicates, and any (optional) assertions 
provided by the user for user-defined predicates. The safety policy consists in 
guaranteeing that the safety assertions hold for the given program in the context 
of the desired abstract domain. This is automatically provided by the inference 
process and its correctness ensured by the proved correctness of the process. 

The certification process - i.e., the generation of a safety certificate by the 
code supplier which is as small as possible - is in turn based on the idea that only 
a particular subset of the analysis results computed by abstract interpretation- 
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based fixpoint algorithms needs to be used to play the role of certificate for at- 
testing program safety. In our implementation, the high-level assertion language 
of [5] is used and the certificate is automatically generated from the results com- 
puted by the goal dependent fixpoint abstract interpretation-based analyzer of 
[3]. These analysis results are represented by means of two data structures in 
the output: the answer table and the arc dependency table. We show that a par- 
ticular subset of the analysis results - namely the answer table - is sufficient 
for mobile code certification. A verification condition generator computes from 
the assertions and the answer table a verification condition in order to attest 
compliance of the program with respect to the safety policy. Intuitively, the ver- 
ification condition is a conjunction of boolean expressions whose validity ensures 
the consistency of a set of assertions. The automatic validator attempts to check 
its validity. When the verification condition is indeed checked, then the answer 
table is considered a valid certificate. 

In order to retain the safety guarantees, the consumer, after receiving the pro- 
gram together with the certificate from the supplier, can trust neither the code 
nor the certificate. Thus, in the validation process, the consumer not only checks 
the validity of the answer table received but it also (re-)generates a trustworthy 
verification condition, as it is done by the supplier. The crucial observation in 
our approach is that the validation process performed by the code consumer is 
similar to the above certification process but replacing the fixpoint analyzer by 
an analysis checker which does not need to compute a fixpoint. It simply checks 
the analysis, using an algorithm which is a very simplified one-pass analyzer. 
Intuitively, since the certification process already provides the fixpoint result as 
certificate, an additional analysis pass over it cannot change the result. Thus, 
as long as the answer table is valid, a single cycle over the code validates the 
certificate. 

We believe that our proposal can bring the expressiveness and automation 
which is inherent to abstract interpretation-based techniques to the area of mo- 
bile code safety. In particular, the expressiveness of existing abstract domains 
will be useful to define a wider range of safety properties. Furthermore, in the 
case of (C)LP the approach inherits the inference power and automation of the 
abstract interpretation engines developed for this paradigm. A complete descrip- 
tion of the method (and related techniques) can be found in [1], 
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Abstract. In this communication, we describe work-in-progress on “we- 
bized” logic programs, and the use of these programs for policy formu- 
lation and exchange in the context of conducting e-commerce. 



A number of practical and theoretical issues remain to be addressed in the con- 
text of conducting e-commerce. When agents engage in business transactions 
via the Web, the notions of policy specification and implementation are funda- 
mentally important. For instance, when one business b\ considers trading with 
another business b 2 , bi will request information from 62 on things like 62 ’s pric- 
ing policies, discounting policies, refund policies, and lead times. Moreover, the 
policy information that 62 releases to &i may depend on b 2 s perception of b\ as a 
possible trading partner e.g., whether b 2 believes (either from direct experience 
or by experiences reported by a trusted third-party) that bi is likely to be a 
potential “bad debtor” or a “prompt payer”. The information that b 2 releases 
to 61 is controlled by using a usage policy specification. 

Logic programs have a number of attractive features that make them suit- 
able for policy representation. For example, logic programs permit the high- 
level, declarative specification of requirements by using a language for which 
well known formal semantics are defined, and for which efficient operational 
methods have been developed. Nevertheless, extended forms of logic programs 
are desirable to better meet the requirements of e-commerce applications. To 
address this need, we introduce labeled normal clause programs. 

A labeled normal clause is a formula of the following form where C is an atom, 
Vi is an empty set or a singleton that is the URI for a non-local source (e.g., an 
ontology) that includes the definition of an atom in the set {A\,A 2 , . . . , A m , B\, 
B 2 , , B n }, and not is negation-as-failure: 

C <- V! : Ax,V2 : A 2 , . . . ,v m : A m , 
not v m+ i : B 1 ,not v m +2 ■ B 2l . . . , 
not Vm+n : B n . 

A labeled logic program is a finite set of labeled normal clauses. For most (all?) 
policy specifications in practice, labeled logic programs will be locally stratified 
and function- free. 

In addition to labeled logic programs, we introduce a markup language for 
enabling policy information to be exchanged between agents that engage in e- 
trading. Our markup language extends RuleML [ 2 ] to enable a variety of different 
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forms of policy information to be exchanged. We refer to this language as ERML 
(viz., extended RuleML). 

The most important feature of ERML is that it permits comparison operators 
in the set <P = {<,<,=, 7 ^, >,>} and the arithmetic operators in the set r = 
{+,- , -j-, x,mod,abs} to be expressed by using markup. Our motivations for 
using r and <P, over a finite domain of natural numbers, is that much information 
that is expressed in policies either requires to be represented and manipulated 
in numeric form or may be equivalently represented and manipulated in numeric 
form. Arithmetic is a universally accepted inter lingua, and all computational 
systems will implement the simple form of arithmetic that we propose for policy 
specifications. Moreover, by imposing some not too restrictive conditions on 
the use of comparison operators and arithmetic operators (e.g., the restriction 
to ground arithmetic expressions [ 1 ]), the soundness and completeness of the 
operational semantics of rule engines that process policy information can be 
ensured. 

Because arithmetic may be soundly implemented in the hardware or software 
used by rule engines, we claim that our approach remains within the realms of 
“pure-belief” systems [3]. Given the importance of arithmetic and comparison 
operators in practical policy formulation, we argue that these notions ought 
to be given first-class status in a markup language for policy representation. 
Moreover, we argue that procedural attachments [3], which have been proposed 
for implementing the operators in r U d>, should only be used for application- 
specific requirements, rather than for defining things like comparison operators 
and arithmetic operations, which are needed for expressing policy requirements 
in general. 

In implementations of our approach, a compiler is used to transform a labeled 
logic program into a language for implementation (e.g., XSB PROLOG or SQL). 
The URIs in a labeled logic program may be mapped to import statements or 
file manipulation functions in the implementation language. Another compiler 
can perform bidirectional translations of PROLOG and SQL to/from ERML. 

To demonstrate a practical application of our approach, we show how a usage 
control model may be formulated to protect a range of Web resources from 
unauthorized access requests. 
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